Please, let Varo Bank know you found this job
on RemoteYeah.
This helps us grow 🌱.
Description:
Varo’s SRE team is responsible for designing, building, and running large-scale, distributed, fault-tolerant systems that power most of Varo's operations.
The team focuses on AWS and Kubernetes, maintaining an open-source first and results-oriented mindset.
The SRE team emphasizes automation and observability, aiming to automate manual tasks and promote a data-driven approach to scaling the platform.
Daily activities include scaling production infrastructure, building CI/CD pipelines, and collaborating with developers to enhance system performance.
The Staff Site Reliability Engineer (SRE) will ensure the reliability, scalability, and performance of cloud-based services, drive best practices, and contribute to the design and implementation of robust cloud infrastructures.
Responsibilities include leading and mentoring a team of SREs, providing strategic direction for cloud infrastructure on AWS, overseeing observability solutions, and managing service meshes with Istio.
The role involves developing SRE best practices, collaborating with development teams, writing infrastructure as code, and automating operational tasks.
Requirements:
A minimum of 12 years of experience as a Site Reliability, DevOps, or Software Engineer, with proficiency in one or more high-level programming languages such as Python, GoLang, Ruby, Java, or JavaScript is required.
Proven leadership experience in SRE team settings, focusing on driving and architecting projects is essential.
Expert-level Linux and troubleshooting skills are required.
Experience in building and supporting high-availability cloud environments in AWS is necessary.
Expertise in Infrastructure as Code (IaC) and deployment automation using tools such as Terraform, Helm, Gitlab, or equivalent is required.
Experience running Kubernetes and Istio in production environments is essential.
Advanced observability skills with monitoring, logging, and tracing tools such as Prometheus, Grafana, Jaeger/Tempo, ELK/Loki, and OpenTelemetry are required.
Experience in instrumenting code (Java/Kotlin, Python, Go, etc.) and creating instrumentation frameworks for developers is necessary.
Participation in an on-call rotation for after-hours production infrastructure incidents is required.
Experience with SDLC, CI/CD, and related tooling is necessary.
Kafka and message streaming experience is a plus.
Benefits:
The salary range for this position is $200,000 - $220,000 per year, based on function, level, and geographic location.
Final offer amounts are determined by multiple factors, including candidate experience and expertise, and may vary from the identified range.
Apply now
Please, let Varo Bank know you found this job
on RemoteYeah
.
This helps us grow 🌱.