Description:

The Senior MLOps Engineer will deploy scalable, production-ready ML services with optimized infrastructure and auto-scaling Kubernetes clusters.
The role involves optimizing GPU resources using MIG (Multi-Instance GPU) and NOS (Node Offloading System).
The engineer will manage cloud storage (e.g., S3) to ensure high availability and performance.
Responsibilities include integrating state-of-the-art ML techniques, such as LoRA and model merging, into workflows.
The engineer will work with SOTA ML codebases and adapt them to organizational needs.
They will integrate LoRA (Low-Rank Adaptation) techniques and model merging workflows.
The role includes deploying and managing large language models (LLM), small language models (SLM), and large multimodal models (LMM).
The engineer will serve ML models using technologies like Triton Inference Server.
They will leverage solutions such as vLLM, TGI (Text Generation Inference), and other state-of-the-art serving frameworks.
The engineer will optimize models with ONNX and TensorRT for efficient deployment.
They will develop Retrieval-Augmented Generation (RAG) systems integrating spreadsheet, math, and compiler processors.
The role requires setting up monitoring and logging solutions using Grafana, Prometheus, Loki, Elasticsearch, and OpenSearch.
The engineer will write and maintain CI/CD pipelines using GitHub Actions for seamless deployment processes.
They will create Helm templates for rapid Kubernetes node deployment.
The engineer will automate workflows using cron jobs and Airflow DAGs.

Requirements:

A Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field is required.
Proficiency in Kubernetes, Helm, and containerization technologies is necessary.
Experience with GPU optimization (MIG, NOS) and cloud platforms (AWS, GCP, Azure) is required.
Strong knowledge of monitoring tools (Grafana, Prometheus) and scripting languages (Python, Bash) is essential.
Hands-on experience with CI/CD tools and workflow management systems is needed.
Familiarity with Triton Inference Server, ONNX, and TensorRT for model serving and optimization is required.
Preferred qualifications include 5+ years of experience in MLOps or ML engineering roles.
Experience with advanced ML techniques, such as multi-sampling and dynamic temperatures, is preferred.
Knowledge of distributed training and large model fine-tuning is a plus.
Proficiency in Go or Rust programming languages is preferred.
Experience designing and implementing highly secure MLOps pipelines, including secure model deployment and data encryption, is preferred.

Benefits:

Working at Fortytwo offers the opportunity to engage in meaningful AI research, focusing on decentralized inference, multi-agent systems, and efficient model deployment.
Employees will have the chance to build scalable and sustainable AI systems that reduce reliance on massive compute clusters, making advanced models more efficient, accessible, and cost-effective.
The role provides the opportunity to collaborate with a highly technical team of engineers and researchers who are deeply experienced, intellectually curious, and motivated by solving hard problems.
Fortytwo values individuals who thrive in research-driven environments, value autonomy, and want to work on foundational AI challenges.

Remote Senior MLOps Engineer

Description:

Requirements:

Benefits: