The company is a product-focused startup with a small team of 14 engineers dedicated to building tools that enhance decision-making through effective research.
The role is for a Site Reliability Engineer who will be the first dedicated DevOps/Infra hire, responsible for end-to-end ownership of platform health, reliability, and scalability.
The engineer will collaborate directly with the engineering team to improve systems, reduce toil, and treat infrastructure as a product.
Responsibilities include defining and maintaining service SLOs, dashboards, and alerts, improving incident detection and response, and leading incident postmortems.
The engineer will maintain and enhance Terraform-managed infrastructure and lead the migration of staging infrastructure to AWS.
The role involves identifying bottlenecks, collaborating with engineers for performance optimization, and implementing scaling strategies.
The engineer will increase pipeline reliability, design load testing strategies, and work on SOC2 compliance protocols with the CTO.
Responsibilities also include monitoring and optimizing cloud spend and building tools for cost-aware decision-making.
Requirements:
Candidates should have 4β8+ years of experience in DevOps, SRE, or Infrastructure roles.
Hands-on experience with AWS services such as EC2, RDS, and VPCs is required.
Proficiency in Terraform, GitHub Actions, Docker, and PostgreSQL is necessary.
A proven track record of improving observability and reducing incident response times is essential.
Experience in high-autonomy, high-ownership environments is preferred.
Candidates should be cost-conscious and capable of identifying waste in infrastructure and cloud spending.
A passion for building leverage tools for engineers and treating infrastructure as a product is important.
Benefits:
The role offers the opportunity to shape the systems and culture of software development and operations.
There is a high level of trust with autonomy and minimal processes, allowing for quick decision-making.
The team values thoughtfulness, speed, and care, fostering a collaborative environment without egos.
There is potential for growth within the company, with pathways to platform leadership, head of Infra/SRE, or principal engineer roles as the team expands.