Remote Site Reliability Engineer - AI & ML Infrastructure (Kubernetes, AWS & Terraform)

Posted 3 months ago

Share:

Please let Deepgram know you found this job on RemoteYeah. This helps us get more companies to post jobs here for you.

Description:

  • Seeking an experienced Site Reliability Engineer to build and operate hybrid infrastructure for AI/ML research and product development.
  • Responsibilities include architecting, building, and maintaining platforms on AWS and bare metal data centers using Kubernetes and Terraform.

Requirements:

  • 5+ years of experience in Platform Engineering, DevOps, or Site Reliability Engineering (SRE).
  • Proven experience with Terraform and Kubernetes in large-scale environments.
  • Familiarity with HPC job schedulers like Slurm for GPU workloads.
  • Strong scripting skills in languages such as Python, Go, or Bash.

Benefits:

  • Comprehensive health benefits including medical, dental, and vision.
  • Unlimited PTO and flexible work schedule.
  • Learning and education stipends, plus participation in conferences.

Report this job

Job expired or something else is wrong with this job?

Report job
SerpApi

SerpApi

Scrape Google and other search engines from our fast, easy, and complete API.

RemoteYeah Ads