Zefr is a leading global technology company focused on responsible marketing in social environments.
The company provides solutions for brands to manage content adjacency on platforms like YouTube, Meta, TikTok, and Snap.
As a Site Reliability Engineer, you will apply your expertise in cloud infrastructure, CI/CD, Observability, and core SRE concepts.
You will work closely with Engineering and Data Science teams to ensure robust, efficient, and scalable infrastructure.
The role involves supporting and building systems and tools for engineers to manage product features.
You will deploy and support a multi-cloud, micro-service architecture using Github Actions, ArgoCD, and Kubernetes.
Collaboration with engineers to architect secure, resilient, scalable, and cost-efficient applications in AWS and GCP is essential.
You will foster a DevOps culture by encouraging continuous improvement across engineering teams.
Proactive maintenance of production environments, including monitoring application performance, is required.
Participation in a 24/7 on-call rotation to respond to system performance issues and outages is expected.
You will debug code at both the application and infrastructure levels and mature CI/CD workflows.
A forward-thinking approach is necessary, including researching and proposing new solutions.
You will propose and review Engineering Request for Comments (RFC) to influence engineering architecture and practices.
Requirements:
A minimum of 4 years of experience designing, managing, deploying, and supporting Cloud Infrastructure in a production environment using major public cloud providers, with experience in either GCP or AWS required.
Production experience in designing, managing, deploying, and maintaining container-based workloads in Kubernetes clusters is necessary.
Knowledge of GitOps and modern CI/CD pipelines, including tools like Github Actions, GitLab, CircleCI, Argo CD, and Flux, is required.
Familiarity with Infrastructure as Code (IaC) and configuration management tools such as Terraform, OpenTofu, Crossplane, Pulumi, Ansible, and CloudFormation is essential.
Strong problem-solving skills with a focus on automation are required.
Experience with Monitoring and Observability tools like Prometheus, Grafana, Datadog, Thanos, New Relic, and Open Telemetry is necessary.
Understanding of Cloud Networking concepts, including Mesh Networking, NAT, Load Balancers, SSL Certificates, and API Gateways, is required.
Strong written and verbal communication, organization, and documentation skills are essential.
Benefits:
Zefr offers a flexible work environment that allows team members to work from home, local spots, or the London office.
A monthly allowance is provided for Health Care, Dental, Optical, Income Protection, and Relevant Life.
The company has a Pension Scheme with a 3% contribution from the Company.
Employees receive a total of 28 days of holidays per year, including UK Bank Holidays.
A flexible hybrid work schedule is available, along with Summer Fridays where employees leave early.