Ververica is seeking a Site Reliability Engineer (SRE) to design, provision, and maintain the infrastructure for its Unified Streaming Data Platform across multiple cloud providers, including AWS, GCP, and Azure.
The SRE will collaborate with software engineering teams to develop solutions that enhance feature delivery, optimize performance, and address security vulnerabilities.
Responsibilities include building and maintaining infrastructure, managing Infrastructure as Code (IaC) using Terraform, implementing observability tooling, ensuring system reliability, improving infrastructure architecture, enhancing CI/CD pipelines, monitoring security vulnerabilities, contributing to product development, participating in on-call rotations, and maintaining documentation.
Requirements:
A Bachelor’s degree in Computer Science, Information Technology, or a related field is required.
A minimum of 2 years of hands-on experience with Kubernetes clusters, Helm charts, controllers, and operators is necessary.
Proficiency in designing and maintaining Terraform code with best practices is essential.
Strong knowledge of observability tools and practices, including metrics, logging, and alerting systems, is required.
Experience implementing SRE principles such as SLIs, SLOs, and error budgets is needed.
A solid understanding of Linux systems and networking in cloud environments is important.
Hands-on experience managing multiple Kubernetes clusters is required.
Familiarity with distributed systems or streaming data platforms is preferred.
Knowledge of cloud-native security best practices is necessary.
Benefits:
The position offers the opportunity to work with cutting-edge technology in real-time data processing and analytics.
Employees will have the chance to collaborate with talented teams and contribute to the development of innovative products and features.
The role includes participation in on-call rotations, providing experience in managing incidents in a 24/7 live infrastructure.
There is a focus on continuous learning and improvement, allowing for professional growth in the field of Site Reliability Engineering.