Remote Senior Site Reliability Engineer - Databases (Remote, Canada)
Posted
This job is closed
This job post is closed and the position is probably filled. Please do not apply.
🤖 Automatically closed by a robot after apply link
was detected as broken.
Description:
We are looking for a Senior Site Reliability Engineer (SRE) to support our highest value Grafana Cloud customers by increasing the reliability of our Cloud databases based on Mimir, Loki, Tempo, and Pyroscope.
The SRE team is a new team within the Databases department, responsible for the environments for our largest customers and acting as an overlay to existing teams that run the databases.
As an SRE, you will own the configuration of the software via Helm charts and Jsonnet, be involved with the PRR for new features, shepherd releases to the environment, and ensure new releases do not degrade the SLOs or user experience.
You will directly contribute to design docs, code, PR review, and other engineering activities to improve reliability and observability for customers.
The role includes a shared on-call element, allowing you to focus on customer experience while being supported by another on-call engineer.
The company hires globally to ensure a healthy on-call rotation aligned to 12 daylight hours per day.
Requirements:
A strong engineering background with at least 6 years of experience, including at least 3 years in SRE roles.
Experience may include roles as a reliability/production engineer, infrastructure/systems engineer, or software engineer with an infrastructure/systems focus.
Good communication skills to engage in deep technical conversations with engineers and customers, and collaborate across organizational boundaries.
Experience with Kubernetes on AWS, GCP, or Azure, and working with Helm charts or other Infrastructure as Code (IaC) tools.
Familiarity with Site Reliability Engineering, Large System Design, and Distributed Computing.
Proficiency in one or more programming languages such as Go, Python, or Java.
Knowledge of Linux operating systems internals, networking, cloud storage, and scaling.
Excellent problem-solving and troubleshooting skills.
Experience in blame-free Incident Response, including writing high-quality Post Incident Reviews (PIRs).
Ability to work autonomously within an engineering team.
Intellectual curiosity, transparency, a high bias towards action, and kindness are highly valued traits.
Benefits:
The base compensation range for this role in Canada is CAD 146,000 - CAD 175,000, with actual compensation varying based on level, experience, and skillset.
Benefits include equity, a bonus (if applicable), and other benefits as listed on the company's careers page.
Grafana Labs promotes a diverse and inclusive workplace, encouraging applicants from all backgrounds to apply.