Welcome to RemoteYeah 2.0! Find out more about the new version here.

Remote Site Reliability Engineer (m/f/d)

at Ververica GmbH

Posted 1 day ago | 0 applied

Description:

  • Ververica is seeking a Site Reliability Engineer (SRE) to design, provision, and maintain the infrastructure for its Unified Streaming Data Platform across multiple cloud providers, including AWS, GCP, and Azure.
  • The SRE will collaborate with software engineering teams to develop solutions that enhance feature delivery, optimize performance, and address security vulnerabilities.
  • Responsibilities include building and maintaining infrastructure, managing Infrastructure as Code (IaC) using Terraform, implementing observability tooling, ensuring system reliability, improving infrastructure architecture, enhancing CI/CD pipelines, monitoring security vulnerabilities, contributing to product development, participating in on-call rotations, and maintaining documentation.

Requirements:

  • A Bachelor’s degree in Computer Science, Information Technology, or a related field is required.
  • A minimum of 2 years of hands-on experience with Kubernetes clusters, Helm charts, controllers, and operators is necessary.
  • Proficiency in designing and maintaining Terraform code with best practices is essential.
  • Strong knowledge of observability tools and practices, including metrics, logging, and alerting systems, is required.
  • Experience implementing SRE principles such as SLIs, SLOs, and error budgets is needed.
  • A solid understanding of Linux systems and networking in cloud environments is important.
  • Hands-on experience managing multiple Kubernetes clusters is required.
  • Familiarity with distributed systems or streaming data platforms is preferred.
  • Knowledge of cloud-native security best practices is necessary.

Benefits:

  • The position offers the opportunity to work with cutting-edge technology in real-time data processing and analytics.
  • Employees will have the chance to collaborate with talented teams and contribute to the development of innovative products and features.
  • The role includes participation in on-call rotations, providing experience in managing incidents in a 24/7 live infrastructure.
  • There is a focus on continuous learning and improvement, allowing for professional growth in the field of Site Reliability Engineering.