Remote Senior MLOps Engineer

Posted

Apply now
Please, let Fortytwo know you found this job on RemoteYeah. This helps us grow 🌱.

Description:

  • The Senior MLOps Engineer will deploy scalable, production-ready ML services with optimized infrastructure and auto-scaling Kubernetes clusters.
  • The role involves optimizing GPU resources using MIG (Multi-Instance GPU) and NOS (Node Offloading System).
  • The engineer will manage cloud storage (e.g., S3) to ensure high availability and performance.
  • Responsibilities include integrating state-of-the-art ML techniques, such as LoRA and model merging, into workflows.
  • The engineer will work with SOTA ML codebases and adapt them to organizational needs.
  • They will integrate LoRA (Low-Rank Adaptation) techniques and model merging workflows.
  • The role includes deploying and managing large language models (LLM), small language models (SLM), and large multimodal models (LMM).
  • The engineer will serve ML models using technologies like Triton Inference Server.
  • They will leverage solutions such as vLLM, TGI (Text Generation Inference), and other state-of-the-art serving frameworks.
  • The engineer will optimize models with ONNX and TensorRT for efficient deployment.
  • They will develop Retrieval-Augmented Generation (RAG) systems integrating spreadsheet, math, and compiler processors.
  • The role requires setting up monitoring and logging solutions using Grafana, Prometheus, Loki, Elasticsearch, and OpenSearch.
  • The engineer will write and maintain CI/CD pipelines using GitHub Actions for seamless deployment processes.
  • They will create Helm templates for rapid Kubernetes node deployment.
  • The engineer will automate workflows using cron jobs and Airflow DAGs.

Requirements:

  • A Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field is required.
  • Proficiency in Kubernetes, Helm, and containerization technologies is necessary.
  • Experience with GPU optimization (MIG, NOS) and cloud platforms (AWS, GCP, Azure) is required.
  • Strong knowledge of monitoring tools (Grafana, Prometheus) and scripting languages (Python, Bash) is essential.
  • Hands-on experience with CI/CD tools and workflow management systems is needed.
  • Familiarity with Triton Inference Server, ONNX, and TensorRT for model serving and optimization is required.
  • Preferred qualifications include 5+ years of experience in MLOps or ML engineering roles.
  • Experience with advanced ML techniques, such as multi-sampling and dynamic temperatures, is preferred.
  • Knowledge of distributed training and large model fine-tuning is a plus.
  • Proficiency in Go or Rust programming languages is preferred.
  • Experience designing and implementing highly secure MLOps pipelines, including secure model deployment and data encryption, is preferred.

Benefits:

  • Working at Fortytwo offers the opportunity to engage in meaningful AI research, focusing on decentralized inference, multi-agent systems, and efficient model deployment.
  • Employees will have the chance to build scalable and sustainable AI systems that reduce reliance on massive compute clusters, making advanced models more efficient, accessible, and cost-effective.
  • The role provides the opportunity to collaborate with a highly technical team of engineers and researchers who are deeply experienced, intellectually curious, and motivated by solving hard problems.
  • Fortytwo values individuals who thrive in research-driven environments, value autonomy, and want to work on foundational AI challenges.
Apply now
Please, let Fortytwo know you found this job on RemoteYeah . This helps us grow 🌱.
About the job
Posted on
Job type
Salary
-
Location requirements

-

Report this job

Job expired or something else is wrong with this job?

Report this job
Leave a feedback