Please, let Brightspeed know you found this job
on RemoteYeah.
This helps us grow 🌱.
Description:
Brightspeed is seeking a Principal Site Reliability Engineer to join their team, focusing on implementing and maintaining monitoring systems for business-critical systems and infrastructure.
The role involves responding to system outages and performance issues, performing root cause analysis to prevent recurrence.
Responsibilities include developing scripts and tools to automate repetitive tasks, ensuring new services and features are reliable and scalable, and working on reducing latency and improving data transmission speed.
The engineer will define and measure Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to meet performance and availability targets.
Conducting postmortems after incidents to identify improvements and collaborating with Lead Application owners and Change Management for code reviews and deployments are also key duties.
The position requires leading a team of site reliability engineers, mentoring them for system reliability support, and effectively communicating with various stakeholders.
Requirements:
A Master’s degree in computer science, telecommunications, or a related field is required, along with a minimum of 10 years of software engineering experience, including at least 5 years as a site reliability engineer.
Candidates must have a proven track record of managing mission-critical customer-facing applications for reliability.
A minimum of 5 years of experience supporting operations and maintenance for cloud-native applications that are fault-tolerant, self-healing, scalable, and highly available is necessary.
Excellent troubleshooting and problem-solving skills, with attention to detail for resolving complex production issues, are essential.
A deep understanding of cloud computing platforms (GCP) and containerization technologies (e.g., Docker, Kubernetes) is required.
Solid experience with core Kubernetes concepts and infrastructure as code tools (e.g., Terraform, Ansible, ArgoCD) is needed.
Strong experience with CI/CD pipelines and integration of code quality tools (SonarQube or Checkmarx) is necessary.
Proficiency in monitoring, logging, and observability tools like Splunk, GCP log, and Dynatrace is required.
Candidates must demonstrate the ability to work independently and collaboratively, effectively communicating technical concepts to both technical and non-technical stakeholders.
Proven written and verbal communication skills, including presentation abilities using tools like PowerPoint, are essential.
Bonus points for certifications such as Google Professional Cloud DevOps Engineer or AWS Certified DevOps Engineer.
Benefits:
Brightspeed offers competitive compensation and a comprehensive benefits program that includes medical, dental, vision, and life insurance.
An employee assistance program and a 401K plan with company match are part of the benefits package.
The company promotes overall wellness through physical, emotional, and financial health initiatives.
Brightspeed is recognized as a Top Workplace and values a diverse and inclusive work environment, encouraging employees to bring their authentic selves to work.
Apply now
Please, let Brightspeed know you found this job
on RemoteYeah
.
This helps us grow 🌱.