The Senior Site Reliability Engineer will contribute to the evolution of Overstory's GCP infrastructure and DevOps practices, including incident management, SLOs, and error budgets.
The role involves championing observability to improve mean time to recover and utilizing DORA metrics to enhance product creation and optimize GCP usage.
Responsibilities include designing and evolving cloud infrastructure to support scaling needs, building tooling and automation for team autonomy, advancing the observability platform, raising awareness of infrastructure costs, and promoting reliability best practices.
Requirements:
Candidates must be able to prioritize collaboratively between tactical problems and strategic direction.
Proficiency in working in a terminal in a Unix-based environment is required.
Experience with Infrastructure-As-Code principles is essential.
Familiarity with major Cloud Providers is necessary.
Strong communication skills are needed to express ideas to diverse audiences.
A proactive attitude, organizational skills, and the ability to manage competing deadlines and priorities are important.
Candidates should be comfortable in a fast-paced, changing environment and eager to learn new skills.
A self-starter mindset is required, with the ability to identify and tackle issues independently.
Teamwork and a desire to help others grow and succeed are crucial.
Benefits:
Employees will be part of mission-driven work that aims to reduce wildfires and protect natural resources.
A flexible working environment with autonomy is offered, allowing work days to be built around personal lives.
Additional benefits include a remote working budget, an educational budget, and time for skill development.
Team members will be surrounded by a vibrant, supportive team that values openness, tolerance, and respect.