The Staff Software Engineer, Data Ingestion will be a critical individual contributor responsible for designing collection strategies, developing, and maintaining robust and scalable data pipelines.
This role is at the heart of our data ecosystem, delivering new analytical software solutions to access timely, accurate, and complete data for insights, products, and operational efficiency.
Key responsibilities include designing, developing, and maintaining high-performance, fault-tolerant data ingestion pipelines using Python.
The engineer will integrate with diverse data sources such as databases, APIs, streaming platforms, and cloud storage.
They will implement data transformation and cleansing logic during ingestion to ensure data quality.
Monitoring and troubleshooting data ingestion pipelines is essential, with a focus on identifying and resolving issues promptly.
Collaboration with database engineers to optimize data models for fast consumption is required.
The engineer will evaluate and propose new technologies or frameworks to improve ingestion efficiency and reliability.
Developing and implementing self-healing mechanisms for data pipelines to ensure continuity is a key task.
Defining and upholding SLAs and SLOs for data freshness, completeness, and availability is expected.
Participation in on-call rotation as needed for critical data pipeline issues is part of the role.
Requirements:
Candidates must have 6+ years of experience in the software development industry with a background in computer science.
Extensive expertise in Python is required, with a proven track record of developing robust, production-grade applications.
Proven experience in collecting data from various sources, including REST APIs, OAuth, GraphQL, Kafka, S3, and SFTP, is necessary.
A strong understanding of distributed systems concepts, including designing for scale, performance optimization, and fault tolerance, is essential.
Experience with major cloud providers such as AWS or GCP and their data-related services (e.g., S3, EC2, Lambda, SQS, Kafka, Cloud Storage, GKE) is required.
A solid understanding of relational databases, including SQL, schema design, indexing, and query optimization, is necessary; OLAP database experience (e.g., Hadoop) is a plus.
Experience with monitoring tools such as Prometheus and Grafana, along with setting up effective alerts, is required.
Proficiency with Git for version control is necessary.
Experience with Docker and Kubernetes is a plus.
Familiarity with real-time data processing using technologies such as Kafka, Flink, and Spark Streaming is also a plus.
Benefits:
The position offers the opportunity to work on critical data ingestion projects that impact the entire data ecosystem.
Employees will have the chance to collaborate with a talented team of engineers and database experts.
The role provides opportunities for professional growth and the ability to evaluate and implement new technologies.
Participation in on-call rotations allows for hands-on experience with real-time problem-solving in critical situations.
The company promotes a culture of innovation and continuous improvement in data processing and ingestion strategies.