Welcome to RemoteYeah 2.0! Find out more about the new version here.

Remote Site Reliability Engineer

at Great Question

Posted 1 day ago | 0 applied

Description:

  • The company is a product-focused startup with a small team of 14 engineers dedicated to building tools that enhance decision-making through effective research.
  • The role is for a Site Reliability Engineer who will be the first dedicated DevOps/Infra hire, responsible for end-to-end ownership of platform health, reliability, and scalability.
  • The engineer will collaborate directly with the engineering team to improve systems, reduce toil, and treat infrastructure as a product.
  • Responsibilities include defining and maintaining service SLOs, dashboards, and alerts, improving incident detection and response, and leading incident postmortems.
  • The engineer will maintain and enhance Terraform-managed infrastructure and lead the migration of staging infrastructure to AWS.
  • The role involves identifying bottlenecks, collaborating with engineers for performance optimization, and implementing scaling strategies.
  • The engineer will increase pipeline reliability, design load testing strategies, and work on SOC2 compliance protocols with the CTO.
  • Responsibilities also include monitoring and optimizing cloud spend and building tools for cost-aware decision-making.

Requirements:

  • Candidates should have 4–8+ years of experience in DevOps, SRE, or Infrastructure roles.
  • Hands-on experience with AWS services such as EC2, RDS, and VPCs is required.
  • Proficiency in Terraform, GitHub Actions, Docker, and PostgreSQL is necessary.
  • A proven track record of improving observability and reducing incident response times is essential.
  • Experience in high-autonomy, high-ownership environments is preferred.
  • Candidates should be cost-conscious and capable of identifying waste in infrastructure and cloud spending.
  • A passion for building leverage tools for engineers and treating infrastructure as a product is important.

Benefits:

  • The role offers the opportunity to shape the systems and culture of software development and operations.
  • There is a high level of trust with autonomy and minimal processes, allowing for quick decision-making.
  • The team values thoughtfulness, speed, and care, fostering a collaborative environment without egos.
  • There is potential for growth within the company, with pathways to platform leadership, head of Infra/SRE, or principal engineer roles as the team expands.