Remote Site Reliability Engineer

at Weekday AI

Posted 2 weeks ago 0 applied

Description:

This role is for one of Weekday's clients.
The position is full-time and requires a minimum of 5 years of experience.
The Site Reliability Engineer will help build and maintain highly reliable, scalable, and secure infrastructure and applications.
The role will focus on automating operations, improving system performance, and ensuring overall service health by applying modern SRE practices.
Key responsibilities include designing, implementing, and managing Kubernetes-based infrastructure.
The engineer will utilize AWS services such as IAM, EC2, EKS, S3, and CloudWatch to build and support scalable cloud environments.
Developing and maintaining automation scripts and tools using Shell scripting or Python is essential.
The engineer will proactively identify, analyze, and troubleshoot complex application, network, and system-level issues.
Optimizing system performance and reliability, with deep expertise in Linux debugging and performance tuning, is required.
Building automation for system self-healing and recovery mechanisms is part of the role.
Developing monitoring and alerting solutions for high-performance and low-latency applications is necessary.
Collaboration with development and operations teams to implement effective CI/CD pipelines is expected.
The engineer will apply SRE principles including service monitoring, alerting, error budget tracking, capacity planning, fault tolerance, automation, and toil reduction.
Continuously seeking opportunities to improve system reliability and engineering processes is a key aspect of the job.

Requirements:

Proven experience working with Kubernetes in production environments is required.
A strong command of AWS cloud services with hands-on experience in infrastructure provisioning and management is necessary.
Proficiency in scripting or programming, preferably in Shell or Python, is essential.
In-depth Linux knowledge, including tools for diagnostics and performance optimization, is required.
Familiarity with modern observability tools for monitoring, logging, and alerting is necessary.
Strong troubleshooting and problem-solving skills are essential for this role.
An understanding and application of SRE concepts and best practices is required.

Benefits:

The position offers the opportunity to work with cutting-edge technologies in a dynamic environment.
Employees will have the chance to enhance their skills in Site Reliability Engineering and cloud infrastructure.
The role provides a platform for collaboration with talented development and operations teams.
There are opportunities for continuous learning and professional growth within the organization.

Apply now

Please let Weekday AI know you found this job on RemoteYeah. This helps us get more companies to post jobs here for you.

Hiring company

Weekday AI

View all Weekday AI jobs Visit www.weekday.works

About the job

Posted on

July 2, 2025

Job type

Full-time

Salary

Location requirements

🇮🇳 India

Job title

Site Reliability Engineer

Experience level

Mid-level

Degree requirement

🎓🚫 No degree required

Skills

Docker AWS Kubernetes EKS CI/CD Python Shell Cloudwatch

Benefits

Report this job

Job expired or something else is wrong with this job?

Report job