Please, let Vultr know you found this job
on RemoteYeah.
This helps us grow 🌱.
Description:
Vultr is seeking a Staff Site Reliability Engineer to join their Platform team, which is essential to the company's growth strategy.
The role involves collaborating with cross-functional teams to create and implement a modern observability stack and improve incident-handling processes.
Responsibilities include designing cloud provider solutions for high-performance computing, AI training, and inference workloads, with a focus on Observability and MLOps.
The engineer will enhance system resilience and stability through software improvements, architecture, and automation.
The position requires addressing challenges from low-level hardware issues to high-level distributed application scale challenges.
The engineer will champion DevOps and SRE principles through automation and collaboration within the engineering team.
Improving customer experience by enhancing case handling and striving for proactive responses and automated resolutions is a key expectation.
The role includes developing documentation to assist junior SREs in managing recurring reliability issues.
The engineer will identify and implement scalable solutions to technical challenges, setting benchmarks for innovation.
Requirements:
Candidates must have 3+ years of experience in a hands-on SRE role delivering distributed architectures.
A minimum of 2+ years of experience working with and maintaining Kubernetes clusters in highly available and regulated environments is required.
Applicants should have 2+ years of hands-on experience with a modern Grafana stack, including Mimir, Loki, and Tempo.
Experience with complex CI/CD Pipelines (Gitlab/Jenkins), configuration management (Puppet/Salt), and Infrastructure as Code (IaC) solutions such as Terraform is necessary.
Familiarity with observability pipelines or Open Telemetry is a plus.
A background in performance optimization for Webstacks, including components like PHP-FPM, Nginx, and MySQL is preferred.
Strong programming skills in Python, Golang, or PHP are essential, along with a willingness to learn new technologies.
Benefits:
Vultr offers a 100% remote work environment along with a company-wide virtual get-together.
Employees can participate in a 401(k) plan that matches 100% up to 4% with immediate vesting.
There is a Professional Development Reimbursement of $2,500 each year for career growth.
The company provides 11 holidays, paid time off accrual, a rollover plan, and the option to take off your birthday.
Increased PTO is available at the 3-year anniversary, along with a 1-month sabbatical at the 5-year anniversary and an anniversary bonus each year.
A $500 remote office setup allowance is provided for the first year, with $400 each year following for new equipment.
Monthly internet reimbursement of up to $75 is included.
Employees receive $50 per month for a gym membership.
Apply now
Please, let Vultr know you found this job
on RemoteYeah
.
This helps us grow 🌱.