Qualifications & Experience:
· Degree in a relevant field such as Computer Science, Computational Mathematics, Computer Engineering or Software Engineering.
· Specialization or electives in a Data & Analytics field (e.g. Data Warehousing, Data Science, Business Intelligence) a nice-to-have.
· Experience 2+ years Data Engineering experience with focus on quality & automation.
· Minimum 2+ years of testing, automation and support experience in analytics applications such as Data Lake and Data Warehouse (preferably using the Big Data stack and Microsoft Azure cloud infrastructure).
· Seasoned in identifying quality issues across complex data pipelines running on big data technologies and defining rules for validating health of the data.
· Experienced in a wide variety of testing methods and tools covering functional, performance, security tests across individual jobs, pipelines and end to end across enterprise.
· Well versed in building automated checks running in a CI pipeline to validate the ETL / ELT jobs are performing as expected.
· Experience with batch and real-time data ingestion/integration tools and technologies handling massive quantities of data (structured and unstructured).
· Exposed to data architecture concepts such as data modelling, Big Data storage, and dimensional modelling.
· Exposed to working with jobs in data pipelines and define metrics / measures to ensure correctness of the data.
· Programming (Python or Scala) and SQL querying skills is required.
· Exposure to Spark & airline industry experience is nice-to-have
Knowledge/Skills:
· Ability to drive quality of data assets independently.
· Able to deliver solutions (and associated value) interactively.
· Strong ability to conduct data analysis (e.g. source system identification, data dictionary / metadata collection, data profiling, source-to-target mapping) is preferred.
· Operates with You Code It, You Own I, mind-set (i.e. supports the products they build).
· Team player, able to collaborate with others to remove blockers, solve complex design problems and debug/resolve issues.
· Is accountable and displays positive attitude.
· Self-starter and has passion for exploring and learning new technologies, especially those in the Enterprise Data & Analytics space.
Key Technologies/Tools:
Big Data & distributed processing : Spark, Hadoop (HDFS, Hive, H-Base, Oozie), Airflow, , Apache Nifi, Azure (ADLS, DataBricks, Azure Data Factory) Elasticsearch, AVRO / PARQUET file formats Data Analysis.
Modelling and Reporting: Snowflake, SQL, Data Vault 2.0, MicroStrategy, Power BI.
Cloud Technologies: Microsoft Azure and Cloudera technology stacks
Integration and Messaging: Streaming (e.g. Spark Streaming), SnapLogic, TIBCO, Kafka,
CI/CD : GIT, Bitbucket, Jenkins, Azure DevOps, Kubernetes, docker, SonarQube,
Gatling Languages: Scala, Python