The position requires a minimum of 5 years of experience.
The job is full-time and remote, based in India.
Key responsibilities include designing, developing, and maintaining scalable and efficient data pipelines to collect, process, and store data from various sources.
The role involves integrating and transforming raw data into clean, usable formats for analytics and reporting, ensuring consistency, quality, and integrity.
The candidate will build and optimize data warehouses to store structured and unstructured data, ensuring data is organized, reliable, and accessible.
The position requires developing and managing ETL processes for data ingestion, cleaning, transformation, and loading into databases or data lakes.
The candidate will monitor and optimize data pipeline performance to handle large volumes of data with low latency, ensuring reliability and scalability.
Collaboration with other product teams, TSO, and business stakeholders is essential to understand data requirements and ensure that data infrastructure supports analytical needs.
The role includes ensuring that data systems meet security and privacy standards and implementing best practices for data governance, monitoring, and error handling.
The candidate will automate data workflows and establish monitoring systems to detect and resolve data issues proactively.
Understanding the broad architecture of the GEP's entire system as well as Analytics is required.
The candidate must take full accountability for their role, owning development and results.
Requirements:
Proficiency in programming languages such as Python, PySpark, and Scala is required.
Experience with Azure Environment tools including Azure Data Factory, Databricks, Key Vault, and DevOps CI/CD is necessary.
Knowledge of storage and databases like ADLS Gen 2, Azure SQL DB, and Delta Lake is essential.
Experience in data engineering with Apache Spark, Hadoop, optimization, performance tuning, and data modeling is required.
Familiarity with data sources such as Kafka and MongoDB is preferred.
The candidate should have experience with automation of test cases for Big Data and ETL processes.
A basic understanding of ETL pipelines is necessary.
A strong understanding of AI, machine learning, and data science concepts is highly beneficial.
Strong analytical and problem-solving skills with attention to detail are required.
The ability to work independently and as part of a team in a fast-paced environment is essential.
Excellent communication skills are necessary to collaborate with both technical and non-technical stakeholders.
Experience in designing and implementing scalable and optimized data architectures following best practices is required.
A strong understanding of data warehousing concepts, data lakes, and data modeling is necessary.
Familiarity with data governance, data quality, and privacy regulations is important.
Benefits:
The position offers a full-time remote work opportunity from India.
The role provides the chance to work with cutting-edge technologies in data engineering.
The candidate will have the opportunity to collaborate with diverse teams and stakeholders.
The position allows for professional growth and development in a fast-paced environment.
The role includes the potential for ownership of projects and accountability for results.