Remote MLOps Engineer (SRE)

at Metova

Posted 2 days ago 0 applied

Description:

  • A leading company in Mexico specializing in accounting software is looking for a highly skilled MLOps Engineer (SRE) to join the team.
  • The role involves designing and operating observability solutions for ML models in production, including monitoring, alerts, and traceability.
  • Responsibilities include developing dashboards and metrics to evaluate model performance, cost, and stability, as well as implementing structured logging, drift monitoring, data quality, and inference error tools.
  • The MLOps Engineer will collaborate with data science and product teams to detect and mitigate incidents related to models in production.
  • The position requires applying SRE practices such as chaos engineering, stress testing, staging testing, and continuous integration.

Requirements:

  • Candidates must have 4+ years of experience as an SRE, DevOps, or Platform Engineer with ML projects.
  • Fluency in technical English is required.
  • Experience with orchestrators such as Airflow, Kubeflow, or experiment tracking tools (MLflow, Weights & Biases) is necessary.
  • Experience in high-transaction environments such as banking, accounting, payroll, or logistics is a nice to have.
  • Knowledge of model monitoring frameworks such as Evidently, Arize AI, WhyLabs, or similar is essential.
  • Proficiency in Prometheus, Grafana, ELK/EFK, OpenTelemetry, or Datadog is required.
  • Candidates must be proficient in Kubernetes, Docker, Helm, and infrastructure automation tools (Terraform, Pulumi).
  • Solid fundamentals in CI/CD for ML pipelines, including testing, validation, and rollback, are necessary.

Benefits:

  • The job offers an opportunity to work with a leading company in the accounting software industry.
  • Employees will have the chance to collaborate with talented data science and product teams.
  • The position provides a platform to apply and enhance SRE practices in a dynamic environment.
  • The role allows for professional growth in the field of MLOps and machine learning engineering.