Remote Site Reliability Engineer

Posted 8 months ago

Share:

Please let bet365 know you found this job on RemoteYeah. This helps us get more companies to post jobs here for you.

Description:

  • As a Site Reliability Engineer, you will enhance system reliability, observability, and performance through a strong engineering approach and assist with incident resolution and best practices.
  • You will have software engineering skills, focusing on system reliability and observability.
  • Your responsibilities include monitoring the health, performance, and availability of critical systems, directly impacting operational efficiency.
  • You will implement solutions that enhance reliability, including service instrumentation with tools such as Open Telemetry, improve logging practices, and develop features for maintainability.
  • You will help engineer tools and automation for effective service management.
  • Collaboration is key, as you will work across multiple functions to integrate reliability and observability best practices into the software development life cycle.
  • By supporting governance standards set by the central teams, you will foster a culture where these principles are integral to development.
  • Your contributions will ensure our systems meet user demands and enhance overall service performance.
  • This role is eligible for inclusion in the Companyโ€™s hybrid working from home policy.

Requirements:

  • You must have excellent knowledge of Site Reliability Engineering principles, including the creation and management of effective Service Level Indicators (SLI) and Service Level Objectives (SLO) for reliability and customer satisfaction.
  • Knowledge of contemporary observability tools, techniques, and best practices including Splunk, New Relic, Grafana, and Pager Duty is required.
  • You should possess excellent knowledge of programming languages including Python, Golang, and JavaScript.
  • Knowledge and experience of modern software development techniques and lifecycles are necessary.
  • Experience with Infrastructure as Code (IaC) automation and orchestration tools such as Ansible and Terraform is essential.
  • Prior experience working in a large scale, 24/7 enterprise where system uptime and stability is of paramount importance to the business is required.
  • A keen interest in industry trends, particularly Platform Engineering, is expected.
  • Proficiency in shell scripting for automation and system management tasks is necessary.

Benefits:

  • You will have the opportunity to write and contribute to code that enhances the reliability and observability of services, including telemetry, operational APIs, and tooling.
  • You will develop and maintain tools that facilitate effective management of our systems, ensuring they are operationally efficient and resilient.
  • The role involves working with automation and orchestration platforms to automate manual activity and reduce toil.
  • You will build sophisticated dashboards using a range of telemetry data and dashboarding technologies like Grafana, Splunk, and New Relic.
  • You will maintain and administer existing monitoring and analytic toolsets.
  • Mentoring colleagues in the use of new technologies or practices will be part of your responsibilities.
  • You will actively participate in live incident resolution and post-mortem analysis, providing effective remediation strategies to improve overall system health and prevent future issues.
  • You will drive initiatives to enhance system reliability and observability, contributing to a culture of continuous improvement.
  • Collaboration with the central Site Reliability Engineering and Observability teams to establish and uphold standards for reliability and observability will be expected, assisting teams in adhering to these practices.
  • You will work with IT Operations, providing and supporting the use of critical tooling to enable increasing levels of value to the business.

Job type

Experience level

Required experience

-

Salary

-

Degree requirement

No degree required

Location requirements

Benefits

-

Report this job

Job expired or something else is wrong with this job?

Report job
SerpApi

SerpApi

Scrape Google and other search engines from our fast, easy, and complete API.

RemoteYeah Ads