Remote Lead Site Reliability Engineer

Posted

This job is closed

This job post is closed and the position is probably filled. Please do not apply.  Automatically closed by a robot after apply link was detected as broken.

Description:

  • As a Lead Site Reliability Engineer at Xero, you will be part of the Reliability Enablement team, known as Reliability Rangers, focusing on post-incident analysis, incident learning, and providing specialized reliability enablement and consulting.
  • You will have the opportunity to work either centrally within the reliability enablement team or embedded within an engineering portfolio, where you will be involved in improving system reliability, on-call health, observability, and addressing operational issues.
  • Responsibilities include investigating operational surprises, conducting incident analysis, leading reliability uplift initiatives, supporting strategic features with reliability expertise, and improving production operations practices.
  • The role involves working with highly distributed systems, leading incident management and response efforts, and collaborating with teams to enhance system reliability and robustness.
  • Required skills include experience in logging, monitoring, and observability, technical leadership, incident management, systems thinking, and proficiency in object-oriented programming languages or infrastructure-as-code.
  • Preferred qualifications include experience with cloud providers, designing and operating distributed systems, implementing Service Level Objectives (SLOs), and using software engineering to solve operational challenges.

Requirements:

  • Solid experience in logging, monitoring, and observability of highly distributed systems
  • Leading incident management and response efforts, including critical incidents
  • Post-incident reviews, incident analysis, and learning from incidents
  • Experience in a tech or product company with comparable scale and complexity
  • Proficiency in object-oriented programming languages or infrastructure-as-code
  • Technical leadership experience in an operational or site reliability capacity
  • Experience in delivering technical initiatives and setting technical direction
  • Preferred: Experience with cloud providers, designing and operating distributed systems, and implementing SLOs

Benefits:

  • Generous paid leave, including statutory holidays
  • Dedicated paid leave for physical and mental wellbeing, Employee Assistance Program for mental health care
  • Free medical insurance, wellbeing and sports programs, employee resource groups
  • 26 weeks of paid parental leave for primary caregivers
  • Employee Share Plan, beautiful offices, flexible working, career development
  • Inclusive and collaborative culture, diversity and inclusion initiatives
  • Support for underrepresented groups to apply, regardless of perfect alignment with requirements
  • NZ Immigration Accredited Employer and Rainbow Tick certified
About the job
Posted on
Job type
Salary
-
Position

-

Experience level
Leave a feedback