Remote Staff Site Reliability Engineer - Incident Response
Posted
This job is closed
This job post is closed and the position is probably filled. Please do not apply.
π€ Automatically closed by a robot after apply link
was detected as broken.
Description:
Zscaler is seeking an experienced Staff Site Reliability Engineer-Incident Response to join their Shared Platform Engineer team.
The position is remote and requires U.S. citizenship due to the nature of the customers assigned to this role.
The role involves leading the transformation to a world-leading SRE organization and promoting SRE principles within the Engineering Department.
The engineer will provide expert leadership during critical outages, coordinating multiple teams for streamlined decision-making and quick resolution.
A customer-focused approach is essential, addressing global customer environment issues and fostering a culture of continuous learning and technical excellence within the SRE team.
Responsibilities include developing and implementing scalable process frameworks and observability strategies for rapid problem diagnosis, response, and service reliability.
Collaboration with product teams is necessary to analyze failures and integrate insights to improve service reliability, scalability, and operational efficiency.
Requirements:
Candidates must have 5+ years of experience as a Site Reliability Engineer, with relevant experience in an Operations or Engineering environment.
Hands-on experience troubleshooting Linux-based systems is required.
Networking knowledge is essential, including the ability to troubleshoot TCP/IP, SSL/TLS, DNSSEC, IPsec, and BGP issues.
Coding experience, preferably in Python, for building tools, scripting, or automation is necessary.
A Bachelor's degree in Computer Science, a related technical field involving computer systems engineering, or equivalent practical experience is required.
Preferred qualifications include experience supporting High/Moderate FedRAMP environments, understanding of Observability practices and tools such as Grafana, DataDog, and Splunk, and experience leading major incidents in large scale, high uptime environments.
Benefits:
Zscaler offers various health plans to support employee well-being.
Time off plans for vacation and sick time are provided to ensure work-life balance.
Parental leave options are available for new parents.
Retirement options are included to help employees plan for their future.
Education reimbursement is offered to support continuous learning and development.
Additional in-office perks and benefits are available to enhance the employee experience.