Remote Senior DevOps Engineer (Kubernetes, MLOps, LLMOps)

Posted

This job is closed

This job post is closed and the position is probably filled. Please do not apply.  Automatically closed by a robot after apply link was detected as broken.

Description:

  • The Senior DevOps Engineer will be responsible for ensuring high availability, reliability, and scalability of systems.
  • Architect, build, and monitor cloud-native architectures using Kubernetes, particularly focusing on machine learning and AI workloads.
  • Collaborate with data scientists and ML engineers to streamline the build and deployment process for ML models and LLMs in Kubernetes.
  • Manage infrastructure for continuous integration, delivery, and monitoring of ML models and AI services.
  • Optimize infrastructure for efficient training, deployment, and scaling of ML models and LLMs, utilizing Kubernetes and cloud-native tools like AWS SageMaker.
  • Develop and maintain monitoring and alerting solutions tailored to ML and AI workloads.
  • Troubleshoot and resolve production incidents with minimal downtime.
  • Ensure security and compliance of production systems, particularly protecting sensitive AI and ML data.
  • Mentor and coach junior DevOps engineers.

Requirements:

  • Bachelor's degree in Computer Science, Engineering, or related field.
  • Minimum 7 years of experience in maintaining optimal performance of online production environments.
  • At least 4 years of experience managing production Kubernetes infrastructure.
  • Strong experience with Docker for containerization.
  • Deep understanding of the machine learning lifecycle, including model training, deployment, monitoring, and scaling.
  • Experience with MLOps tools and frameworks like Kubeflow, MLflow.
  • Familiarity with LLMOps and scripting languages such as Python.
  • Proficiency in Infrastructure deployment and automation tools like Terraform, CloudFormation.
  • Expertise in monitoring and logging solutions such as Prometheus and Grafana.
  • Strong knowledge of Linux systems, networking, and security concepts.
  • Excellent communication and collaboration skills.
  • Experience working in an agile environment.
  • Certifications like CKA or CKAD are a plus.

Benefits:

  • Great team camaraderie and collaboration.
  • Opportunity to work remotely.
  • Competitive salary ranging from $130,000 to $175,000 a year.
  • 3 weeks of paid vacation.
  • Generous medical, dental, and vision plans.
  • Sick leave and paid holidays.
  • Modern technologies and tools for continuous learning.
  • Supportive and self-managing team environment.
  • Amenities like stocked kitchen, stand/sit workstations, and casual work environment.
About the job
Posted on
Job type
Salary
$ 130,000 - 175,000 USD / year
Leave a feedback