Welcome to RemoteYeah 2.0! Find out more about the new version here.

Remote Data Engineer

at Machinify

Posted 1 week ago 4 applied

Description:

  • Machinify is the leading provider of AI-powered software products that transform healthcare claims and payment operations.
  • The company addresses the issue of over $200B in claims mispayments in the healthcare industry, which creates waste and frustration for patients, providers, and payers.
  • As a Data Engineer, you will transform raw external data into powerful, trusted datasets that drive payment, product, and operational decisions.
  • You will collaborate with product managers, data scientists, subject matter experts, engineers, and customer teams to build, scale, and refine production pipelines, ensuring data is accurate, observable, and actionable.
  • Your role will involve onboarding new customers and integrating their raw data into internal models.
  • The pipelines you create will power the company’s ML models, dashboards, and core product experiences.
  • You will design and implement robust, production-grade pipelines using Python, Spark SQL, and Airflow to process high-volume file-based datasets (CSV, Parquet, JSON).
  • You will lead efforts to canonicalize raw healthcare data into internal models and own the full lifecycle of core pipelines from file ingestion to validated, queryable datasets.
  • You will build resilient transformation logic with data quality checks, validation layers, and observability.
  • You will refactor and scale existing pipelines, tune Spark jobs, and implement schema enforcement aligned with internal data standards.
  • You will monitor pipeline health, participate in on-call rotations, and debug production data flow issues.
  • You will contribute to the evolution of the data platform and build streaming pipelines where needed to support near-real-time data needs.
  • You will help develop and champion internal best practices around pipeline development and data modeling.

Requirements:

  • You must have 4+ years of experience as a Data Engineer (or equivalent), building production-grade pipelines.
  • Strong expertise in Python, Spark SQL, and Airflow is required.
  • You should have experience processing large-scale file-based datasets (CSV, Parquet, JSON, etc.) in production environments.
  • Experience in mapping and standardizing raw external data into canonical models is necessary.
  • Familiarity with AWS (or any cloud) is required, including file storage and distributed compute concepts.
  • You should have experience onboarding new customers and integrating external customer data with non-standard formats.
  • The ability to work across teams, manage priorities, and own complex data workflows with minimal supervision is essential.
  • Strong written and verbal communication skills are necessary to explain technical concepts to non-engineering partners.
  • You should be comfortable designing pipelines from scratch and improving existing pipelines.
  • Experience working with large-scale or messy datasets (healthcare, financial, logs, etc.) is required.
  • Experience building or a willingness to learn streaming pipelines using tools such as Kafka or SQS is preferred.
  • Bonus: Familiarity with healthcare data (837, 835, EHR, UB04, claims normalization) is a plus.

Benefits:

  • You will have the opportunity to make a real impact, as your pipelines will directly support decision-making and claims payment outcomes from day one.
  • The role offers high visibility, allowing you to partner with ML, Product, Analytics, Platform, Operations, and Customer teams on critical data initiatives.
  • You will have total ownership of driving the lifecycle of core datasets powering the platform.
  • Your work will contribute to successful customer onboarding and data integration, providing a customer-facing impact.