Remote Senior MLOps Engineer

Posted

Apply now
Please, let Fortytwo know you found this job on RemoteYeah. This helps us grow 🌱.

Description:

  • The Senior MLOps Engineer will deploy scalable, production-ready ML services with optimized infrastructure and auto-scaling Kubernetes clusters.
  • The role involves optimizing GPU resources using MIG (Multi-Instance GPU) and NOS (Node Offloading System).
  • The engineer will manage cloud storage (e.g., S3) to ensure high availability and performance.
  • Responsibilities include integrating state-of-the-art ML techniques, such as LoRA and model merging, into workflows.
  • The engineer will work with SOTA ML codebases and adapt them to organizational needs.
  • They will integrate LoRA (Low-Rank Adaptation) techniques and model merging workflows.
  • The role includes deploying and managing large language models (LLM), small language models (SLM), and large multimodal models (LMM).
  • Serving ML models using technologies like Triton Inference Server is also part of the job.
  • The engineer will leverage solutions such as vLLM, TGI (Text Generation Inference), and other state-of-the-art serving frameworks.
  • They will optimize models with ONNX and TensorRT for efficient deployment.
  • Developing Retrieval-Augmented Generation (RAG) systems integrating spreadsheet, math, and compiler processors is required.
  • The engineer will set up monitoring and logging solutions using Grafana, Prometheus, Loki, Elasticsearch, and OpenSearch.
  • Writing and maintaining CI/CD pipelines using GitHub Actions for seamless deployment processes is expected.
  • Creating Helm templates for rapid Kubernetes node deployment is part of the responsibilities.
  • Automating workflows using cron jobs and Airflow DAGs is also required.

Requirements:

  • A Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field is required.
  • Proficiency in Kubernetes, Helm, and containerization technologies is necessary.
  • Experience with GPU optimization (MIG, NOS) and cloud platforms (AWS, GCP, Azure) is essential.
  • Strong knowledge of monitoring tools (Grafana, Prometheus) and scripting languages (Python, Bash) is required.
  • Hands-on experience with CI/CD tools and workflow management systems is necessary.
  • Familiarity with Triton Inference Server, ONNX, and TensorRT for model serving and optimization is required.
  • Preferred qualifications include 5+ years of experience in MLOps or ML engineering roles.
  • Experience with advanced ML techniques, such as multi-sampling and dynamic temperatures, is preferred.
  • Knowledge of distributed training and large model fine-tuning is a plus.
  • Proficiency in Go or Rust programming languages is preferred.
  • Experience designing and implementing highly secure MLOps pipelines, including secure model deployment and data encryption, is preferred.

Benefits:

  • Working at Fortytwo provides the opportunity to engage in meaningful AI research focused on decentralized inference, multi-agent systems, and efficient model deployment.
  • Employees will have the chance to build scalable and sustainable AI systems that reduce reliance on massive compute clusters, making advanced models more efficient, accessible, and cost-effective.
  • The role offers collaboration with a highly technical team of engineers and researchers who are deeply experienced, intellectually curious, and motivated by solving hard problems.
  • The company values individuals who thrive in research-driven environments, value autonomy, and want to work on foundational AI challenges.
Apply now
Please, let Fortytwo know you found this job on RemoteYeah . This helps us grow 🌱.
About the job
Posted on
Job type
Salary
-
Location requirements

-

Report this job

Job expired or something else is wrong with this job?

Report this job
Leave a feedback