Remote Lead Data Engineer

Posted

Apply now
Please, let StrongDM know you found this job on RemoteYeah. This helps us grow 🌱.

Description:

  • StrongDM is seeking a highly skilled Principal Data Engineer with extensive experience in building cloud data lakes and architecting large-scale data platforms.
  • The role involves designing and implementing data architectures that support diverse use cases, from AI/ML to business intelligence (BI).
  • The ideal candidate will have deep expertise in tabular formats like Apache Iceberg, Apache Parquet, and other open standards.
  • Responsibilities include leading the design and development of scalable data lake architectures on cloud platforms (e.g., AWS, Azure, GCP) optimized for both structured and unstructured data.
  • The candidate will implement and manage tabular formats to efficiently store and process large datasets.
  • The role requires architecting and building large-scale, highly available data platforms that support real-time analytics, reporting, and AI workloads.
  • The candidate will leverage various compute engines (e.g., Apache Spark, Dremio, Presto, Trino) to support complex business intelligence and AI use cases, optimizing performance and cost-efficiency.
  • Collaboration with AI and machine learning teams is essential to design data pipelines that enable AI model training, deployment, and real-time inference.
  • The candidate will establish best practices for data governance, ensuring data quality, security, and compliance with industry regulations.
  • The role includes providing technical leadership to data engineering teams and mentoring junior engineers, fostering a culture of continuous learning and innovation.

Requirements:

  • Strong knowledge of big data processing frameworks and data streaming technologies is required.
  • Experience collaborating with AI/ML teams, building data pipelines that feed AI models, and ensuring data readiness for machine learning workflows is essential.
  • Proven experience in architecting and building data lakes on cloud platforms (AWS, Azure, GCP) is necessary.
  • In-depth knowledge of Apache Iceberg, Apache Parquet, and other open standards for efficient data storage and query optimization is required.
  • Expertise in using compute engines such as Apache Spark, Dremio, Presto, or similar, with hands-on experience in optimizing them for business intelligence and AI workloads is needed.
  • A proven track record of leading large-scale data engineering projects and mentoring teams is essential.
  • Proficiency in programming languages such as Python, Java, or Scala, and SQL for querying and managing large datasets is required.
  • Previous experience working directly with AI or machine learning teams is preferred.
  • A deep understanding of distributed systems and the challenges of scaling data infrastructure in large, dynamic environments is preferred.
  • Familiarity with modern data warehousing solutions such as Snowflake or Redshift is preferred.

Benefits:

  • The compensation for this position ranges from $190,000 to $230,000, depending on experience, along with equity salary packages.
  • Company-sponsored benefits include medical, dental, and vision insurance, which are free to employees and their dependents.
  • Additional benefits include a 401K, HSA, FSA, short/long-term disability coverage, and life insurance.
  • Employees receive 6 weeks of combined accrued vacation and sick time, along with volunteer days and standard holidays.
  • The company offers 24 weeks of paid parental leave for everyone, plus 1 month of transition time back and a childcare stipend for the first year.
  • There is a generous monthly and annual stipend for internet and home office expenses.
Apply now
Please, let StrongDM know you found this job on RemoteYeah . This helps us grow 🌱.
About the job
Posted on
Job type
Salary
$ 190,000 - 230,000 USD / year
Position
Experience level
Report this job

Job expired or something else is wrong with this job?

Report this job
Leave a feedback