This job post is closed and the position is probably filled. Please do not apply.
🤖 Automatically closed by a robot after apply link
was detected as broken.
Description:
Varo’s SRE team is well established, designing, building, and running large-scale, distributed, fault-tolerant systems that power most of Varo's operations.
The team focuses on AWS and Kubernetes, maintaining an open-source first and result-oriented mindset.
The SRE team is automation and observability focused, striving to automate manual tasks and promote a data-driven approach to scaling the platform.
Daily activities include scaling production infrastructure, building CI/CD pipelines, and collaborating with developers to enhance operations.
As a Staff Site Reliability Engineer (SRE), you will ensure the reliability, scalability, and performance of cloud-based services.
You will drive best practices and contribute to the design and implementation of robust cloud infrastructures while shaping the technical roadmap.
Requirements:
A minimum of 12 years of experience as a Site Reliability, DevOps, or Software Engineer with proficiency in one or more high-level languages (such as Python, GoLang, Ruby, Java, or JavaScript) is required.
Proven leadership experience in SRE team settings, focusing on driving and architecting projects is essential.
Expert Linux and troubleshooting skills are required.
Experience in building and supporting high-availability cloud environments in AWS is necessary.
Expertise in Infrastructure as Code (IaC) and deployment automation with tools such as Terraform, Helm, Gitlab, or equivalent is required.
Experience running Kubernetes and Istio in production is essential.
Advanced observability skills with monitoring, logging, and tracing tools such as Prometheus, Grafana, Jaeger/Tempo, ELK/Loki, and OpenTelemetry are required.
Experience instrumenting code (Java/Kotlin, Python, Go, etc.) and creating simple instrumentation frameworks for developers is necessary.
Participation in an on-call rotation for after-hours production infrastructure incidents is required.
Experience with SDLC, CI/CD, and related tooling is essential.
Kafka and message streaming experience is a plus.
Benefits:
The salary range for this role is $200,000 - $220,000 per year, based on function, level, and geographic location.
Final offer amounts are determined by multiple factors, including candidate experience and expertise, and may vary from the identified range.