Please, let Fortytwo know you found this job
on RemoteYeah.
This helps us grow 🌱.
Description:
The Senior MLOps Engineer will deploy scalable, production-ready ML services with optimized infrastructure and auto-scaling Kubernetes clusters.
The role involves optimizing GPU resources using MIG (Multi-Instance GPU) and NOS (Node Offloading System).
The engineer will manage cloud storage (e.g., S3) to ensure high availability and performance.
Responsibilities include integrating state-of-the-art ML techniques, such as LoRA and model merging, into workflows.
The engineer will work with SOTA ML codebases and adapt them to organizational needs.
They will integrate LoRA (Low-Rank Adaptation) techniques and model merging workflows.
The role includes deploying and managing large language models (LLM), small language models (SLM), and large multimodal models (LMM).
Serving ML models using technologies like Triton Inference Server is also part of the job.
The engineer will leverage solutions such as vLLM, TGI (Text Generation Inference), and other state-of-the-art serving frameworks.
They will optimize models with ONNX and TensorRT for efficient deployment.
Developing Retrieval-Augmented Generation (RAG) systems integrating spreadsheet, math, and compiler processors is required.
The engineer will set up monitoring and logging solutions using Grafana, Prometheus, Loki, Elasticsearch, and OpenSearch.
Writing and maintaining CI/CD pipelines using GitHub Actions for seamless deployment processes is expected.
Creating Helm templates for rapid Kubernetes node deployment is part of the responsibilities.
Automating workflows using cron jobs and Airflow DAGs is also required.
Requirements:
A Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field is required.
Proficiency in Kubernetes, Helm, and containerization technologies is necessary.
Experience with GPU optimization (MIG, NOS) and cloud platforms (AWS, GCP, Azure) is essential.
Strong knowledge of monitoring tools (Grafana, Prometheus) and scripting languages (Python, Bash) is required.
Hands-on experience with CI/CD tools and workflow management systems is necessary.
Familiarity with Triton Inference Server, ONNX, and TensorRT for model serving and optimization is required.
Preferred qualifications include 5+ years of experience in MLOps or ML engineering roles.
Experience with advanced ML techniques, such as multi-sampling and dynamic temperatures, is preferred.
Knowledge of distributed training and large model fine-tuning is a plus.
Proficiency in Go or Rust programming languages is preferred.
Experience designing and implementing highly secure MLOps pipelines, including secure model deployment and data encryption, is preferred.
Benefits:
Working at Fortytwo provides the opportunity to engage in meaningful AI research focused on decentralized inference, multi-agent systems, and efficient model deployment.
Employees will have the chance to build scalable and sustainable AI systems that reduce reliance on massive compute clusters, making advanced models more efficient, accessible, and cost-effective.
The role offers collaboration with a highly technical team of engineers and researchers who are deeply experienced, intellectually curious, and motivated by solving hard problems.
The company values individuals who thrive in research-driven environments, value autonomy, and want to work on foundational AI challenges.
Apply now
Please, let Fortytwo know you found this job
on RemoteYeah
.
This helps us grow 🌱.