Description:

We are a product-focused startup with a small team of 14 engineers dedicated to building tools that enhance decision-making through effective research.
As we expand, we require a Site Reliability Engineer to enhance our infrastructure, ensuring better observability, stronger systems, faster deployments, and smarter cloud spending decisions.
You will be the first dedicated DevOps/Infra hire, responsible for the end-to-end ownership of platform health, reliability, and scalability.
Your responsibilities will include defining and maintaining service SLOs, dashboards, and alerts, improving incident detection and response, and establishing best practices around reliability and error budgets.
You will maintain and improve Terraform-managed infrastructure, lead the migration of staging infrastructure to AWS, and scale systems to accommodate growth and changing workloads.
You will work on increasing pipeline reliability, speeding up deployment cycles, and improving rollback confidence.
You will help identify and fix slow database queries, optimize indexes, and support product teams with performance diagnostics.
You will monitor and optimize cloud spending while building visibility and tooling to assist teams in making cost-aware decisions.

Requirements:

You should have 4–8+ years of experience in DevOps, SRE, or Infrastructure roles.
Hands-on experience with AWS services such as EC2, RDS, and VPCs is required.
Proficiency in Terraform, GitHub Actions, Docker, and PostgreSQL is essential.
A proven track record of improving observability and reducing incident response times is necessary.
Experience working in high-autonomy, high-ownership environments is preferred.
You must be cost-conscious and capable of identifying waste in infrastructure and cloud spending.
A passion for building leverage tools for engineers, treating infrastructure as a product, is important.

You will have the opportunity to shape the systems and culture of how we build and run software, making a significant impact.
The role offers high autonomy with low process, allowing you to make smart decisions and move quickly.
You will work with a team that values thoughtfulness, speed, and care, with no egos involved.
There is a clear growth path available, with opportunities to advance into platform leadership, head of Infra/SRE, or principal engineer roles as the company expands.