Remote Sr. Site Reliability Engineer

Posted

This job is closed

This job post is closed and the position is probably filled. Please do not apply.  Automatically closed by a robot after apply link was detected as broken.

Description:

  • Varo’s SRE team is responsible for designing, building, and running large-scale, distributed, fault-tolerant systems that power most of Varo's operations.
  • The team focuses on AWS and Kubernetes, maintaining an open-source first and results-oriented mindset.
  • Members of the team strive to automate manual tasks and promote a data-driven approach to scaling the platform.
  • Daily activities include scaling production infrastructure, building CI/CD pipelines, and collaborating with developers to enhance operations.
  • Responsibilities include taking ownership of the availability and resiliency of Varo's cloud-based infrastructure, designing disaster recovery scenarios, and implementing self-healing patterns.
  • The role involves writing and maintaining infrastructure as code using Terraform and Kubernetes helm charts, as well as building and maintaining CI/CD pipelines.
  • The engineer will improve observability and monitoring by implementing advanced tools and technologies, creating monitoring dashboards, alerts, and log systems.
  • The position requires leading high-profile incidents and facilitating blameless post-mortems.
  • Collaboration with development teams to implement and improve SLIs and SLOs is essential, along with using monitoring data to drive actionable insights.
  • The engineer will automate operational tasks, write clean and scalable scripts, and manage platform infrastructure and applications.

Requirements:

  • A minimum of 8 years of experience as a Site Reliability, DevOps, or Software Engineer with proficiency in high-level programming languages such as Python, GoLang, Ruby, Java, or JavaScript is required.
  • Excellent Linux and troubleshooting skills are necessary.
  • Experience in building and supporting high-availability cloud environments in AWS is essential.
  • Proficiency in Infrastructure as Code (IaC) and deployment automation using tools like Terraform, Helm, Gitlab, or equivalent is required.
  • Experience running Kubernetes in production is mandatory.
  • Familiarity with Istio is a plus.
  • Experience with monitoring, logging, and tracing tools such as Prometheus, Grafana, Jaeger/Tempo, ELK/Loki, and OpenTelemetry is required.
  • The candidate should have experience instrumenting code in languages like Java/Kotlin, Python, or Go, and creating simple instrumentation frameworks.
  • Participation in an on-call rotation for after-hours production infrastructure incidents is expected.
  • Experience with the Software Development Life Cycle (SDLC), CI/CD, and related tooling is necessary.
  • Kafka experience is a plus.

Benefits:

  • The salary range for this role is between $150,000 and $190,000 per year, based on function, level, and geographic location.
  • Final offer amounts are determined by multiple factors, including candidate experience and expertise, and may vary from the identified range.
About the job
Posted on
Job type
Salary
$ 150,000 - 190,000 USD / year
Leave a feedback