Remote Senior Software Engineer, Site Reliability

Posted

Apply now
Please, let Gretel know you found this job on RemoteYeah. This helps us grow 🌱.

Description:

  • At Gretel, the mission is to build the world’s first developer platform for synthetic data, addressing the data bottleneck problem for developers, data scientists, and AI/ML researchers.
  • The Senior or Staff Site Reliability Engineer (SRE) will ensure the safety, security, and reliability of the cloud infrastructure, including compute infrastructure, container orchestration platform, deployment pipelines, and observability stack.
  • Responsibilities include building and maintaining Gretel's observability stack, measuring and monitoring availability, latency, and overall system health.
  • The role involves scaling systems sustainably with automation and continuously improving and evolving systems.
  • The SRE will manage and lead incident response, recovery, and blameless postmortems.
  • The position requires partnering with software engineers to troubleshoot production issues.
  • The engineer will build tools and frameworks to enhance productivity for Gretel engineers.
  • The role includes shipping complex ML/AI models in collaboration with Gretel's applied science and engineering teams.

Requirements:

  • Candidates must have experience with at least one cloud platform, with a strong preference for AWS.
  • Proficiency in Docker and Kubernetes is required.
  • The ability to write software and tools in Python or Go is necessary.
  • Experience with monitoring, alerting, and operations is essential.
  • Candidates should have experience operating highly available distributed systems in the cloud.
  • The ability to identify, diagnose, and respond to operational outages is required.
  • Preferred qualifications include experience with infrastructure as code tools like Terraform or CloudFormation.
  • Familiarity with build systems such as Bazel is a plus.
  • Experience in shipping applications with complex dependencies, such as Pytorch or Tensorflow, is preferred.
  • Software engineering skills beyond script writing, including TDD and design patterns, are desirable.
  • Experience with DevOps or CI/CD pipelines is also preferred.

Benefits:

  • Compensation for the position will be determined based on interview performance, level of experience, specialization of skills, and market rate.
  • The salary range for the Senior or Staff Site Reliability Engineer role is between $180,000 and $230,000 USD.
  • During the offer discussion, the recruiter will review the finalized base salary, bonus (if applicable), benefits, perks, and stock options.
  • Gretel is committed to creating an inclusive environment and celebrates diversity among its employees.
  • Accommodations are available for candidates with disabilities during the recruitment process.
Apply now
Please, let Gretel know you found this job on RemoteYeah . This helps us grow 🌱.
About the job
Posted on
Job type
Salary
$ 180,000 - 230,000 USD / year
Report this job

Job expired or something else is wrong with this job?

Report this job
Leave a feedback