Remote Staff SRE, AI Infrastructure

Posted 3 weeks ago

Share:

Please let Andromeda Cluster know you found this job on RemoteYeah. This helps us get more companies to post jobs here for you.

Description:

  • Join Andromeda as a Staff SRE to ensure the reliability of AI infrastructure from hardware to customer-facing services.
  • Work with a senior team to define infrastructure operations at scale for leading AI labs and cloud providers.

Requirements:

  • Multiple years of experience operating large-scale GPU infrastructure.
  • Proven track record as a senior engineer responsible for load-bearing infrastructure reliability.
  • Deep knowledge of NVIDIA GPU systems and high-performance networking in production.
  • Proficiency in production-grade engineering with Go, Python, or Rust, and experience with Kubernetes and Linux internals.
  • Strong on-call composure and customer-facing technical presence.

Benefits:

  • Significant autonomy in decision-making impacting customer training runs.
  • Opportunity to work on critical infrastructure for ambitious AI labs.

Job type

Experience level

Required experience

2 years

Salary

-

Degree requirement

No degree required

Location requirements

Benefits

-

Report this job

Job expired or something else is wrong with this job?

Report job
SerpApi

SerpApi

Scrape Google and other search engines from our fast, easy, and complete API.

RemoteYeah Ads

No related jobs found.