Remote Site Reliability Engineer

at Zefr

Posted 1 day ago 2 applied

Description:

  • Zefr is a leading global technology company focused on responsible marketing in social environments.
  • The company provides solutions for brands to manage content adjacency on platforms like YouTube, Meta, TikTok, and Snap.
  • As a Site Reliability Engineer, you will apply your expertise in cloud infrastructure, CI/CD, Observability, and core SRE concepts.
  • You will work closely with Engineering and Data Science teams to ensure robust, efficient, and scalable infrastructure.
  • The role involves supporting and building systems and tools for engineers to manage product features.
  • You will deploy and support a multi-cloud, micro-service architecture using Github Actions, ArgoCD, and Kubernetes.
  • Collaboration with engineers to architect secure, resilient, scalable, and cost-efficient applications in AWS and GCP is essential.
  • You will foster a DevOps culture by encouraging continuous improvement across engineering teams.
  • Proactive maintenance of production environments, including monitoring application performance, is required.
  • Participation in a 24/7 on-call rotation to respond to system performance issues and outages is expected.
  • You will debug code at both the application and infrastructure levels and mature CI/CD workflows.
  • A forward-thinking approach is necessary, including researching and proposing new solutions.
  • You will propose and review Engineering Request for Comments (RFC) to influence engineering architecture and practices.

Requirements:

  • A minimum of 4 years of experience designing, managing, deploying, and supporting Cloud Infrastructure in a production environment using major public cloud providers, with experience in either GCP or AWS required.
  • Production experience in designing, managing, deploying, and maintaining container-based workloads in Kubernetes clusters is necessary.
  • Knowledge of GitOps and modern CI/CD pipelines, including tools like Github Actions, GitLab, CircleCI, Argo CD, and Flux, is required.
  • Familiarity with Infrastructure as Code (IaC) and configuration management tools such as Terraform, OpenTofu, Crossplane, Pulumi, Ansible, and CloudFormation is essential.
  • Strong problem-solving skills with a focus on automation are required.
  • Experience with Monitoring and Observability tools like Prometheus, Grafana, Datadog, Thanos, New Relic, and Open Telemetry is necessary.
  • Understanding of Cloud Networking concepts, including Mesh Networking, NAT, Load Balancers, SSL Certificates, and API Gateways, is required.
  • Strong written and verbal communication, organization, and documentation skills are essential.

Benefits:

  • Zefr offers a flexible work environment that allows team members to work from home, local spots, or the London office.
  • A monthly allowance is provided for Health Care, Dental, Optical, Income Protection, and Relevant Life.
  • The company has a Pension Scheme with a 3% contribution from the Company.
  • Employees receive a total of 28 days of holidays per year, including UK Bank Holidays.
  • A flexible hybrid work schedule is available, along with Summer Fridays where employees leave early.