This job post is closed and the position is probably filled. Please do not apply.
🤖 Automatically closed by a robot after apply link
was detected as broken.
Description:
The Senior Site Reliability Engineer will join the Everbridge Kubernetes Platform team, focusing on ensuring the overall service quality and availability of Everbridge's solutions.
This role involves designing, deploying, and managing Kubernetes at scale, while promoting SRE best practices and innovative technology.
Responsibilities include owning and maintaining the Kubernetes infrastructure within AWS, which encompasses services such as VPCs, EC2, Transit Gateways, IAM roles and policies, Route53, S3, Security Groups, and NACLs.
The engineer will enhance the operational availability, security, scalability, efficiency, monitoring, instrumentation, and overall service reliability of Everbridge's Kubernetes solutions.
Collaboration with Agile teams, including Architects, Developers, Quality, Data, Security, and other engineers, is essential for designing and implementing highly reliable solutions.
The role requires researching and implementing SRE and Kubernetes best practices, creating automation, fostering cross-functional collaboration, and making data-driven decisions to ensure system integrity and reliability.
Participation in a rotating on-call schedule to address production escalations is also required.
Requirements:
Candidates must have 3+ years of technical AWS experience, managing and owning systems in a production environment.
A minimum of 2+ years of Kubernetes experience (EKS, AKS, GKE, or self-managed) is required.
Applicants should have 3+ years of experience with Terraform or similar Infrastructure as Code (IaC) tools.
Experience with tooling such as GitLab CICD, Packer, Docker, EKS, Kubernetes, Spinnaker, Helm, and Jenkins is necessary.
Familiarity with telemetry tools like Datadog, SumoLogic, Grafana, and Prometheus is expected.
Candidates should have experience writing automation scripts in languages such as Python, Go, Bash, or Java.
Proficiency with configuration management tools such as Salt, Ansible, or AWS user_data is required.
Experience in a DevOps/SRE production environment and familiarity with Agile practices are essential.
Large-scale production UNIX/Linux experience is also necessary.
Benefits:
Everbridge offers a dynamic work environment that empowers employees to make a difference in keeping people safe and organizations running.
The company provides opportunities for professional growth and development in cutting-edge technology.
Employees can expect a collaborative culture that values innovation and teamwork.
Everbridge is committed to diversity and inclusion, ensuring equal opportunity for all applicants.