This job post is closed and the position is probably filled. Please do not apply.
🤖 Automatically closed by a robot after apply link
was detected as broken.
Description:
The Site Reliability Engineer - II role focuses on owning the availability of Sumo’s planet-scale observability and security products.
The position is remote and can be performed from anywhere in India.
The engineer will work alongside a global SRE team to execute projects in a reliability roadmap specific to their product area.
Responsibilities include optimizing operations, increasing efficiency in cloud resource usage, enhancing security posture, and improving feature velocity for developers.
The role involves supporting engineering teams by maintaining and executing a reliability roadmap for improvements in reliability, maintainability, security, efficiency, and velocity.
Collaboration with development infrastructure, Global SRE, and product area engineering teams is essential to refine the reliability roadmap.
Participation in defining, evolving, and managing Service Level Objectives (SLOs) for several teams is required.
The engineer will participate in on-call rotations to understand operational workloads and improve the on-call experience.
Projects will be completed to optimize the on-call experience for engineering teams.
Continuous improvement of the lifecycle of microservices and architectural components is expected.
The role includes writing code and automation to reduce operational workload and improve security posture.
The engineer will work closely with developer infrastructure teams to expedite the adoption of tools that advance the reliability roadmap.
Scaling systems sustainably through automation and driving improvements in reliability and velocity is a key responsibility.
The engineer will facilitate blame-free root cause analysis meetings and participate in global incident response coordination.
The role requires driving root cause identification and issue resolution with teams in a fast-paced iterative environment.
Requirements:
Candidates must have cloud-native application development experience leveraging best practices and design patterns.
Strong debugging and troubleshooting skills across the entire technology stack are required.
A deep understanding of AWS Networking, Compute, Storage, and managed services is essential.
Competency with modern CI/CD tooling such as Kubernetes, Terraform, Ansible, and Jenkins is necessary.
Experience with full lifecycle support of services, from creation to production support, is required.
Candidates should be versed in Infrastructure as Code practices using technologies like Terraform or Cloud Formation.
The ability to author production-ready code in at least one of the following languages: Java, Scala, or Go is required.
Experience with Linux systems and proficiency on the command line is necessary.
Understanding and applying modern approaches to cloud-native software security is essential.
Experience with agile frameworks, such as Scrum and Kanban, is required.
Candidates must be flexible and willing to take on new roles and responsibilities.
A willingness to learn and use Sumo Logic products for solving reliability and security issues is necessary.
A Bachelor’s or Master's Degree in Computer Science, Electrical Engineering, or another scientific or technical discipline is required.
A minimum of 2 years of industry experience is necessary.
Benefits:
Employees will have the opportunity to work in a remote environment, providing flexibility in their work-life balance.
The role offers the chance to work with cutting-edge technology in a fast-paced and innovative company.
Employees will gain experience in optimizing operations and improving the reliability of large-scale systems.
The position allows for collaboration with a global team of experts in site reliability engineering.
Employees will have opportunities for professional growth and development through continuous learning and exposure to new technologies.
The company promotes a culture of blame-free root cause analysis, fostering a supportive work environment.