Remote Lead Data Engineer at StrongDM

Description:

StrongDM is seeking a highly skilled Principal Data Engineer with extensive experience in building cloud data lakes and architecting large-scale data platforms.
The role involves designing and implementing data architectures that support diverse use cases, from AI/ML to business intelligence (BI).
The ideal candidate will have deep expertise in tabular formats like Apache Iceberg, Apache Parquet, and other open standards.
Responsibilities include leading the design and development of scalable data lake architectures on cloud platforms (e.g., AWS, Azure, GCP) optimized for both structured and unstructured data.
The candidate will implement and manage tabular formats to efficiently store and process large datasets.
The role requires architecting and building large-scale, highly available data platforms that support real-time analytics, reporting, and AI workloads.
The candidate will leverage various compute engines (e.g., Apache Spark, Dremio, Presto, Trino) to support complex business intelligence and AI use cases, optimizing performance and cost-efficiency.
Collaboration with AI and machine learning teams is essential to design data pipelines that enable AI model training, deployment, and real-time inference.
The candidate will establish best practices for data governance, ensuring data quality, security, and compliance with industry regulations.
The role includes providing technical leadership to data engineering teams and mentoring junior engineers, fostering a culture of continuous learning and innovation.

Requirements:

Strong knowledge of big data processing frameworks and data streaming technologies is required.
Experience collaborating with AI/ML teams, building data pipelines that feed AI models, and ensuring data readiness for machine learning workflows is essential.
Proven experience in architecting and building data lakes on cloud platforms (AWS, Azure, GCP) is necessary.
In-depth knowledge of Apache Iceberg, Apache Parquet, and other open standards for efficient data storage and query optimization is required.
Expertise in using compute engines such as Apache Spark, Dremio, Presto, or similar, with hands-on experience in optimizing them for business intelligence and AI workloads is needed.
A proven track record of leading large-scale data engineering projects and mentoring teams is essential.
Proficiency in programming languages such as Python, Java, or Scala, and SQL for querying and managing large datasets is required.
Previous experience working directly with AI or machine learning teams is preferred.
A deep understanding of distributed systems and the challenges of scaling data infrastructure in large, dynamic environments is preferred.
Familiarity with modern data warehousing solutions such as Snowflake or Redshift is preferred.

Benefits:

The compensation for this position ranges from $190,000 to $230,000, depending on experience, along with equity salary packages.
Company-sponsored benefits include medical, dental, and vision insurance, which are free to employees and their dependents.
Additional benefits include a 401K, HSA, FSA, short/long-term disability coverage, and life insurance.
Employees receive 6 weeks of combined accrued vacation and sick time, along with volunteer days and standard holidays.
The company offers 24 weeks of paid parental leave for everyone, plus 1 month of transition time back and a childcare stipend for the first year.
There is a generous monthly and annual stipend for internet and home office expenses.