Remote Site Reliability Engineer Technical Lead

Posted

This job is closed

This job post is closed and the position is probably filled. Please do not apply.  Automatically closed by a robot after apply link was detected as broken.

Description:

  • Lead the implementation and refinement of Site Reliability Engineering (SRE) practices, including SLOs, error budgets, and blameless postmortems
  • Design and implement automation to enhance system reliability and efficiency
  • Architect scalable hybrid cloud solutions for Web3 infrastructure
  • Manage error budgets and prioritize between reliability and new features based on data-driven decisions
  • Ensure high availability, performance, and reliability under varying load conditions
  • Collaborate with the Platform engineering team to embed reliability into services
  • Align SRE strategies with the technical vision of Nethermind’s Infrastructure Leadership department
  • Implement observability best practices and comprehensive monitoring systems
  • Develop and maintain service level indicators (SLIs) and objectives (SLOs) in collaboration with product owners
  • Mentor team members in SRE practices and promote continuous learning
  • Lead capacity planning efforts using quantitative analysis to address future scaling challenges
  • Contribute to long-term technical roadmaps balancing reliability concerns with product innovation

Requirements:

  • 5+ years of experience in Site Reliability Engineering or DevOps
  • Expertise in cloud platforms like AWS and GCP
  • Proficiency in Kubernetes
  • Demonstrated experience in designing and implementing scalable, efficient, resilient systems
  • Deep understanding of Linux/Unix systems and networking protocols
  • Strong programming skills in Python or Go
  • Background in monitoring, observability, and logging systems (e.g., Grafana, Prometheus, Loki)
  • Familiarity with CI/CD tools (e.g., GitHub Actions, ArgoCD)
  • Excellent communication skills to convey complex technical concepts
  • Ability to produce technical documentation, runbooks, presentations, and post-mortem reports
  • Experience in mentoring and upskilling team members

Benefits:

  • Opportunity to lead and mentor a team of Site Reliability Engineers
  • Work on cutting-edge projects in the blockchain space with a globally distributed team
  • Collaborate with renowned companies in the industry
  • Chance to contribute to open-source projects and demonstrate thought leadership in SRE
  • Exposure to MLOps, big data technologies, and blockchain infrastructure
  • Experience with chaos engineering principles and traffic management technologies
  • Potential for career growth and development in a cross-functional environment
About the job
Posted on
Job type
Salary
-
Position
Experience level
Leave a feedback