Remote Site Reliability Engineer - SRE

Posted

This job is closed

This job post is closed and the position is probably filled. Please do not apply.  Automatically closed by a robot after apply link was detected as broken.

Description:

  • Maintain and improve platform reliability, availability, and performance using Azure as the core cloud platform and industry-leading tools.
  • Collaborate with cross-functional teams to design, implement, and maintain resilient systems, automating operations to minimize downtime.
  • Proactively identify and resolve potential issues to prevent customer impact.
  • Contribute to continuous improvement of infrastructure and processes.
  • Analyze reliability challenges and develop automated solutions for incident resolution.
  • Work with development teams to enhance operational features for faster MTTD, MTTR, and auto-recovery.
  • Establish SLIs, SLOs, Error budgets, and policies for operational performance.
  • Identify and address Toil, conduct Post-Mortems, and implement continuous improvements in production operations.
  • Provide advanced technical support for cross-product issues and incidents.
  • Utilize SRE tooling to fulfill the SRE mission, conduct Chaos Testing, and implement new tools and technologies for platform efficiency.
  • Drive reliability and supportability aspects of Cloud service, including change management, customer escalations, remediation plans, playbooks, and automation.
  • Monitor system health, scale systems sustainably through automation, and improve services throughout their lifecycle.

Requirements:

  • 4+ years of experience in Reliability engineering.
  • 2+ recent years of experience with Azure systems.
  • Advanced knowledge of New Relic ecosystem.
  • Working knowledge of Monitoring and APM tools like Azure App Insights, Grafana, and Selenium.
  • Familiarity with networking, troubleshooting latency, connectivity, and performance.
  • Experience with IaC using Terraform and CaC with Ansible.
  • Hands-on experience with SRE practices, Chaos engineering experiments, and containerization.
  • Proficiency in Linux and Windows administration, troubleshooting, and support.
  • Experience with databases such as SQL server, Mongo DB, and PostgreSQL.
  • Knowledge of C#, .Net, PowerShell, Python, or Golang.
  • Experience in High Availability and distributed systems.
  • Proficient in Azure DevOps and debugging skills across integrated platforms.

Benefits:

  • Opportunity to work remotely from anywhere in the United States.
  • Full-time position with a focus on maintaining and improving platform reliability.
  • Collaborative work environment with cross-functional teams.
  • Utilization of industry-leading tools and technologies.
  • Continuous learning and development opportunities in a dynamic environment.
  • Competitive salary and benefits package.
  • Equal opportunity employer with a commitment to diversity and inclusion.
About the job
Posted on
Job type
Salary
-
Leave a feedback