Overview
Data Engineer Manager
Responsibilities
Data Engineering Leadership
- Lead and mentor a team of data engineers in developing and managing scalable, secure, and high-performance data pipelines.
- Define best practices for data ingestion, transformation, and processing in a Lakehouse architecture.
- Drive automation, performance tuning, and cost optimization in cloud data solutions.
Cloud Data Infrastructure & Processing
- Architect and manage AWS-based big data solutions (EMR, EKS, Glue, Redshift).
- Design and maintain Apache Airflow workflows for data orchestration.
- Optimize Spark and distributed data processing frameworks for large-scale workloads.
- Implement streaming solutions (Kafka, Kinesis, Flink) for real-time data processing.
AI/ML & Advanced Analytics
- Collaborate with Data Scientists and AI/ML teams to build and deploy machine learning models using AWS SageMaker.
- Support feature engineering, model training, and inference pipelines at scale.
- Enable AI-driven analytics by integrating structured and unstructured data sources.
Business Intelligence & Visualization
- Support BI and reporting teams with optimized data models for Amazon QuickSight and other visualization tools.
- Ensure efficient data aggregation and pre-processing for interactive dashboards and self-service analytics.
- Design, develop, and maintain middleware components that facilitate seamless communication between data platforms, applications, and analytics layers.
Master Data Management (MDM) & Governance
- Implement MDM strategies to ensure clean, consistent, and deduplicated data.
- Establish data governance policies for security, privacy, and compliance (GDPR, HIPAA, etc.).
- Ensure adherence to data quality frameworks across structured and unstructured datasets.
Collaboration & Strategy
- Partner with business teams, AI/ML teams, and analysts to deliver high-value data products.
- Define and maintain data architecture strategies aligned with business goals.
- Enable real-time and batch processing for analytics, reporting, and AI-driven insights.
Technical Expertise:
- Extensive AWS experience with services such as EMR, EKS, Glue, Redshift, S3, Lambda, and SageMaker.
- Proficient in big data processing frameworks (e.g., Spark, Hive, Presto) and Lakehouse architectures.
- Skilled in designing and managing Apache Airflow workflows and other orchestration tools.
- Solid understanding of Master Data Management (MDM) and data governance best practices.
- Proficient with SQL & NoSQL databases (e.g., Redshift, DynamoDB, PostgreSQL, Elasticsearch).
- Middleware Development – Proven expertise in building middleware components like REST API that integrate data pipelines with applications, analytics platforms, and real-time systems.
- Hands-on experience with Gitlab CI/CD, Terraform, CFT, and Infrastructure-as-Code (IaC) methodologies.
- Familiarity with AI/ML pipelines, model deployment, and monitoring using SageMaker.
- Experience with data visualization tools, particularly AWS QuickSight, for business intelligence.
Qualifications
Experience with Lakehouse frameworks (Glue Catalog, Iceberg, Delta Lake).
Expertise in streaming data solutions (Kafka, Kinesis, Flink).
In-depth understanding of security best practices in AWS data architectures.
Demonstrated success in driving AI/ML initiatives from ideation to production.
Educational Qualification:
- Bachelor’s degree or higher (UG+) in Computer Science, Data Engineering, Aerospace Engineering, or a related field.
- Advanced degrees (Master’s, PhD) in Data Science or AI/ML are a plus.
REQ-145778","qualifications":" Experience with Lakehouse frameworks (Glue Catalog, Iceberg, Delta Lake).
Expertise in streaming data solutions (Kafka, Kinesis, Flink).
In-depth understanding of security best practices in AWS data architectures.
Demonstrated success in driving AI/ML initiatives from ideation to production.
Educational Qualification:
- Bachelor’s degree or higher (UG+) in Computer Science, Data Engineering, Aerospace Engineering, or a related field.
- Advanced degrees (Master’s, PhD) in Data Science or AI/ML are a plus.