We are seeking a highly skilled Lead Data Engineer with strong expertise in PySpark, SQL, and Python, as well as a solid understanding of ETL and data warehousing principles.
The ideal candidate will have a proven track record of designing, building, and maintaining scalable data pipelines in a collaborative and fast-paced environment.
Key responsibilities include designing and developing scalable data pipelines using PySpark to support analytics and reporting needs.
The candidate will write efficient SQL and Python code to transform, cleanse, and optimize large datasets.
Collaboration with machine learning engineers, product managers, and developers to understand data requirements and deliver solutions is essential.
The role involves implementing and maintaining robust ETL processes to integrate structured and semi-structured data from various sources.
Ensuring data quality, integrity, and reliability across pipelines and systems is a critical responsibility.
Participation in code reviews, troubleshooting, and performance tuning is expected.
The candidate will work independently and proactively to identify and resolve data-related issues.
If applicable, the role may involve contributing to Azure-based data solutions, including ADF, Synapse, ADLS, and other services.
Support for cloud migration initiatives and DevOps practices may also be relevant to the role.
Providing guidance on best practices and mentoring junior team members when needed is part of the job.
Requirements:
Candidates must have 8+ years of overall experience working with cross-functional teams, including machine learning engineers, developers, product managers, and analytics teams.
A minimum of 3+ years of hands-on experience developing and managing data pipelines using PySpark is required.
Strong programming skills in Python and SQL are essential.
A deep understanding of ETL processes and data warehousing fundamentals is necessary.
Candidates should be self-driven, resourceful, and comfortable working in dynamic, fast-paced environments.
Advanced written and spoken English fluency is a must-have for this position (B2, C1, or C2 only).
Additional nice-to-have qualifications include Databricks certification and experience with Azure-native services such as Azure Data Lake Storage (ADLS), Azure Data Factory (ADF), and Azure Synapse Analytics.
Familiarity with Event Hub, IoT Hub, Azure Stream Analytics, Azure Analysis Services, and Cosmos DB is beneficial.
A basic understanding of SAP HANA and intermediate-level experience with Power BI is preferred.
Knowledge of DevOps, CI/CD pipelines, and cloud migration best practices is also advantageous.
Candidates must meet mandatory requirements, including 3+ years of experience with PySpark/Python, ETL, and data warehousing processes, proven leadership experience, and must be located in Central or South America.
Benefits:
The position is 100% remote for nearshore candidates located in Central or South America.
The contract type is independent contractor, which does not include PTO, tax deductions, or insurance, but covers the monthly payment based on hours worked.
The initial contract/project duration is 6 months, with the possibility of extension based on performance.
Full-time working hours are Monday to Friday, 8 hours per day, 40 hours per week, from 8:00 AM to 5:00 PM PST (U.S. time zone).
Contractors are required to use their own laptop/PC.
The expected start date is as soon as possible.
Payment methods include international bank transfer, PayPal, Wise, Payoneer, etc.
Joining the team offers the opportunity to be part of an innovative group shaping the future of technology, work in a collaborative and inclusive environment, and access opportunities for professional development and career growth.