A leading company in Mexico specializing in accounting software is looking for a highly skilled MLOps Engineer (SRE) to join the team.
The role involves designing and operating observability solutions for ML models in production, including monitoring, alerts, and traceability.
Responsibilities include developing dashboards and metrics to evaluate model performance, cost, and stability, as well as implementing structured logging, drift monitoring, data quality, and inference error tools.
The MLOps Engineer will collaborate with data science and product teams to detect and mitigate incidents related to models in production.
The position requires applying SRE practices such as chaos engineering, stress testing, staging testing, and continuous integration.
Requirements:
Candidates must have 4+ years of experience as an SRE, DevOps, or Platform Engineer with ML projects.
Fluency in technical English is required.
Experience with orchestrators such as Airflow, Kubeflow, or experiment tracking tools (MLflow, Weights & Biases) is necessary.
Experience in high-transaction environments such as banking, accounting, payroll, or logistics is a nice to have.
Knowledge of model monitoring frameworks such as Evidently, Arize AI, WhyLabs, or similar is essential.
Proficiency in Prometheus, Grafana, ELK/EFK, OpenTelemetry, or Datadog is required.
Candidates must be proficient in Kubernetes, Docker, Helm, and infrastructure automation tools (Terraform, Pulumi).
Solid fundamentals in CI/CD for ML pipelines, including testing, validation, and rollback, are necessary.
Benefits:
The job offers an opportunity to work with a leading company in the accounting software industry.
Employees will have the chance to collaborate with talented data science and product teams.
The position provides a platform to apply and enhance SRE practices in a dynamic environment.
The role allows for professional growth in the field of MLOps and machine learning engineering.