Remote Site Reliability Engineer Manager at Nivoda

Description:

Nivoda is looking for a skilled Site Reliability Engineering (SRE) Manager to ensure the reliability, scalability, and performance of infrastructure and services.
This fully remote position allows you to collaborate with a global team.
Responsibilities include owning the production estate, incident management, designing incident tracking processes, and developing monitoring and automation tooling.
The role involves building and leading a high-performing SRE team through coaching, mentoring, and fostering a culture of collaboration and innovation.

Proven experience in a senior or lead SRE role with a strong track record in maintaining reliable infrastructure.
Expertise in incident management, monitoring tools like Prometheus and Grafana, and cloud platforms such as AWS, Azure, or GCP.
Proficiency in scripting languages like Python, Bash, or Go, and infrastructure as code tools like Terraform or CloudFormation.
Excellent communication and collaboration skills to work effectively in a remote, cross-functional team.
Demonstrated leadership capabilities with a passion for mentoring and developing team members.