Data Engineer
1525 Windward Concourse Alpharetta, GA 30005 US
Job Description
Data Engineer - Identity & Fraud Analytics
Here’s what you’ll be doing:
- Cloud Migration Support: Collaborating with analytical teams to facilitate the migration of analytical data and projects to cloud-based environments, ensuring seamless transitions.
- Automated Pipeline Development: Designing and constructing automated data and analytical pipelines for self-service machine learning initiatives. This includes gathering data from diverse sources, integrating, consolidating, and cleansing datasets, and structuring data for use in client-facing solutions.
- Scripting & Solution Development: Developing and implementing analytical scripts designed to operate within cloud platforms, leveraging various core data sources and utilizing robust programming languages.
- Large Dataset Management: Contributing to the design, creation, and interpretation of extensive and highly complex datasets.
- Stakeholder Consultation: Consulting with internal and external partners to comprehend business requirements and subsequently build datasets and deploy sophisticated big data solutions (under the guidance of senior leads).
- Cross-Functional Enablement: Working with technology and analytics teams to review, understand, and interpret business requirements, then designing and developing necessary functionalities to support advanced identity and fraud analytical needs.
- End-to-End Analytics Development: Contributing to the comprehensive interpretation, design, creation, and deployment of large and intricate analytics-related capabilities (with senior guidance).
Here’s what our ideal candidate has:
- Professional Data Engineering Experience: Over 3-5 years of professional experience in data engineering or data preparation.
- Core Programming: Over 3 years working with Python and Structured Query Language (SQL).
- Big Data Environments: Experience working with distributed or cloud-based big data management environments.
- Scripting for ETL: Proficiency in scripting languages for data movement and Extract, Transform, Load (ETL) processes.
- Big Data Querying: Strong querying skills in big data environments (e.g., Hive-like systems, cloud data warehouses); familiarity with cloud data warehouse API libraries for data preparation automation is a plus.
- Advanced Python: Advanced Python programming, including experience with distributed processing frameworks (e.g., PySpark); proficiency in other relevant programming languages (e.g., Scala) is a plus.
- Development Tools: Proficiency with data visualization tools, large-scale tabular data storage systems, and version control platforms. Familiarity with cloud orchestration and data pipeline services is a plus.
- Cloud Fundamentals: Basic cloud platform certifications are a plus.
- Container Orchestration: Knowledge of container orchestration systems (e.g., Kubernetes or similar cloud-native tools) is a plus.
- Machine Learning Exposure: Basic understanding of machine learning concepts (e.g., ensemble models, unsupervised models) with exposure to popular ML frameworks (e.g., TensorFlow, PyTorch) is a plus.
- Graph Data: Basic knowledge of graph mining and graph data models is a plus.
- Data Management Best Practices: Understanding and application of best practices for data management, maintenance, and reporting to implement continuous improvements.
- Communication & Collaboration: Strong oral and written communication skills, and the ability to collaborate effectively with cross-functional partners.
What Would Stand Out:
- Advanced Degree: A Master's degree in Computer Science, Data Science, or a related field.
- Industry Experience: Prior experience in financial services or e-commerce sectors.
- Advanced ML Algorithms (Preferred): Experience with cutting-edge machine learning algorithms such as deep neural networks, support vector machines, boosting algorithms, or random forest.
- Feature Engineering (Preferred): Experience conducting advanced feature engineering and data dimension reduction in a big data environment.