Remote Staff Site Reliability Engineer - Incident Response

Posted

This job is closed

This job post is closed and the position is probably filled. Please do not apply.  Automatically closed by a robot after apply link was detected as broken.

Description:

  • Zscaler is seeking an experienced Staff Site Reliability Engineer-Incident Response to join their Shared Platform Engineer team.
  • The position is remote and requires U.S. citizenship due to the nature of the customers assigned to this role.
  • The role involves leading the transformation to a world-leading SRE organization and promoting SRE principles within the Engineering Department.
  • The engineer will provide expert leadership during critical outages, coordinating multiple teams for streamlined decision-making and quick resolution.
  • A customer-focused approach is essential, addressing global customer environment issues and fostering a culture of continuous learning and technical excellence within the SRE team.
  • Responsibilities include developing and implementing scalable process frameworks and observability strategies for rapid problem diagnosis, response, and service reliability.
  • Collaboration with product teams is necessary to analyze failures and integrate insights to improve service reliability, scalability, and operational efficiency.

Requirements:

  • Candidates must have 5+ years of experience as a Site Reliability Engineer, with relevant experience in an Operations or Engineering environment.
  • Hands-on experience troubleshooting Linux-based systems is required.
  • Networking knowledge is essential, including the ability to troubleshoot TCP/IP, SSL/TLS, DNSSEC, IPsec, and BGP issues.
  • Coding experience, preferably in Python, for building tools, scripting, or automation is necessary.
  • A Bachelor's degree in Computer Science, a related technical field involving computer systems engineering, or equivalent practical experience is required.
  • Preferred qualifications include experience supporting High/Moderate FedRAMP environments and an understanding of Observability practices and tools such as Grafana, DataDog, and Splunk.
  • Experience leading major incidents in large scale, high uptime environments will make candidates stand out.

Benefits:

  • Zscaler offers various health plans to support employee well-being.
  • Time off plans for vacation and sick time are provided to ensure work-life balance.
  • Parental leave options are available for employees starting or expanding their families.
  • Retirement options are included to help employees plan for their future.
  • Education reimbursement is offered to support continuous learning and professional development.
  • Additional in-office perks and benefits are available to enhance the employee experience.
About the job
Posted on
Job type
Salary
$ 136,500 - 195,000 USD / year
Experience level
Technology stack
Leave a feedback