Remote Site Reliability Engineer-II

Posted

This job is closed

This job post is closed and the position is probably filled. Please do not apply.  Automatically closed by a robot after apply link was detected as broken.

Description:

  • The Site Reliability Engineer - II role focuses on owning the availability of Sumo’s planet-scale observability and security products.
  • The position is remote and can be performed from anywhere in India.
  • The engineer will work alongside a global SRE team to execute projects in a reliability roadmap specific to their product area.
  • Responsibilities include optimizing operations, increasing efficiency in cloud resource usage, enhancing security posture, and improving feature velocity for developers.
  • The role involves supporting engineering teams by maintaining and executing a reliability roadmap for improvements in reliability, maintainability, security, efficiency, and velocity.
  • Collaboration with development infrastructure, Global SRE, and product area engineering teams is essential to refine the reliability roadmap.
  • Participation in defining, evolving, and managing Service Level Objectives (SLOs) for several teams is required.
  • The engineer will participate in on-call rotations to understand operational workloads and improve the on-call experience.
  • Projects will be completed to optimize the on-call experience for engineering teams.
  • Continuous improvement of the lifecycle of microservices and architectural components is expected.
  • The role includes writing code and automation to reduce operational workload and improve security posture.
  • The engineer will work closely with developer infrastructure teams to expedite the adoption of tools that advance the reliability roadmap.
  • Scaling systems sustainably through automation and driving improvements in reliability and velocity is a key responsibility.
  • The engineer will facilitate blame-free root cause analysis meetings and participate in global incident response coordination.
  • The role requires driving root cause identification and issue resolution with teams in a fast-paced iterative environment.

Requirements:

  • Candidates must have cloud-native application development experience leveraging best practices and design patterns.
  • Strong debugging and troubleshooting skills across the entire technology stack are required.
  • A deep understanding of AWS Networking, Compute, Storage, and managed services is essential.
  • Competency with modern CI/CD tooling such as Kubernetes, Terraform, Ansible, and Jenkins is necessary.
  • Experience with full lifecycle support of services, from creation to production support, is required.
  • Candidates should be versed in Infrastructure as Code practices using technologies like Terraform or Cloud Formation.
  • The ability to author production-ready code in at least one of the following languages: Java, Scala, or Go is required.
  • Experience with Linux systems and proficiency on the command line is necessary.
  • Understanding and applying modern approaches to cloud-native software security is essential.
  • Experience with agile frameworks, such as Scrum and Kanban, is required.
  • Candidates must be flexible and willing to take on new roles and responsibilities.
  • A willingness to learn and use Sumo Logic products for solving reliability and security issues is necessary.
  • A Bachelor’s or Master's Degree in Computer Science, Electrical Engineering, or another scientific or technical discipline is required.
  • A minimum of 2 years of industry experience is necessary.

Benefits:

  • Employees will have the opportunity to work in a remote environment, providing flexibility in their work-life balance.
  • The role offers the chance to work with cutting-edge technology in a fast-paced and innovative company.
  • Employees will gain experience in optimizing operations and improving the reliability of large-scale systems.
  • The position allows for collaboration with a global team of experts in site reliability engineering.
  • Employees will have opportunities for professional growth and development through continuous learning and exposure to new technologies.
  • The company promotes a culture of blame-free root cause analysis, fostering a supportive work environment.
About the job
Posted on
Job type
Salary
-
Leave a feedback