This job post is closed and the position is probably filled. Please do not apply.
🤖 Automatically closed by a robot after apply link
was detected as broken.
Description:
As a Lead Site Reliability Engineer at Xero, you will be part of the Reliability Enablement team, known as Reliability Rangers, focusing on post-incident analysis, incident learning, and providing specialized reliability enablement and consulting.
You will have the opportunity to work either centrally within the reliability enablement team or embedded within an engineering portfolio, where you will be involved in improving system reliability, on-call health, observability, and addressing operational issues.
Responsibilities include investigating operational surprises, conducting incident analysis, leading reliability uplift initiatives, supporting strategic features with reliability expertise, and improving production operations practices.
The role involves working with highly distributed systems, leading incident management and response efforts, and collaborating with teams to enhance system reliability and robustness.
Required skills include experience in logging, monitoring, and observability, technical leadership, incident management, systems thinking, and proficiency in object-oriented programming languages or infrastructure-as-code.
Preferred qualifications include experience with cloud providers, designing and operating distributed systems, implementing Service Level Objectives (SLOs), and using software engineering to solve operational challenges.
Requirements:
Solid experience in logging, monitoring, and observability of highly distributed systems
Leading incident management and response efforts, including critical incidents
Post-incident reviews, incident analysis, and learning from incidents
Experience in a tech or product company with comparable scale and complexity
Proficiency in object-oriented programming languages or infrastructure-as-code
Technical leadership experience in an operational or site reliability capacity
Experience in delivering technical initiatives and setting technical direction
Preferred: Experience with cloud providers, designing and operating distributed systems, and implementing SLOs
Benefits:
Generous paid leave, including statutory holidays
Dedicated paid leave for physical and mental wellbeing, Employee Assistance Program for mental health care
Free medical insurance, wellbeing and sports programs, employee resource groups
26 weeks of paid parental leave for primary caregivers
Employee Share Plan, beautiful offices, flexible working, career development
Inclusive and collaborative culture, diversity and inclusion initiatives
Support for underrepresented groups to apply, regardless of perfect alignment with requirements
NZ Immigration Accredited Employer and Rainbow Tick certified