Please, let ZayZoon know you found this job
on RemoteYeah.
This helps us grow 🌱.
Description:
ZayZoon is seeking a Senior Site Reliability Engineer to enhance its cloud infrastructure with complex AWS builds, infrastructure-as-code, and observability/logging/APM solutions.
The role involves working in an embedded reliability team alongside app and data engineers to monitor, benchmark, and scale ZayZoon’s products.
Responsibilities include developing and maintaining infrastructure-as-code CloudFormation templates, focusing on serverless resources such as ECS, Fargate, and Lambda.
The engineer will perform instrumentation and daily metrics analysis of infrastructure performance and Ruby on Rails applications using AWS tools and third-party observability platforms.
Managing deployment pipelines, including blue/green deployments and intelligent auto-scaling, is a key responsibility.
The role requires maintaining resource dependencies, particularly for databases, and planning for updates and downtime.
The engineer will project costs and implement AWS cost-saving programs and reserved instances.
Collaboration with risk and security teams to ensure ongoing SOC-2 and cybersecurity compliance is essential.
Extensive collaboration with app developers on shared metrics, database performance, and load testing is expected.
The engineer will also work with data engineers to facilitate data warehouse development, ELT, and ETL processes.
Participation in the agile development process, including sprint planning, story grooming, and stand-ups, is required.
Adherence to SDLC and secure coding practices is mandatory.
Requirements:
Candidates must have 5+ years of infrastructure experience.
A minimum of 2+ years of AWS experience, including certification and deployment of production applications, is required.
Proficiency with Infrastructure as Code (IaC), specifically CloudFormation, is necessary.
Experience with containerization technologies such as Docker, ECS, and ECR is essential.
Candidates should have experience analyzing and addressing performance issues using observability platforms like DataDog, NewRelic, and OTel.
The ability to build quickly for experimentation and cleanly for core functionality is important.
Strong SQL and data analysis skills, along with a willingness to engage in data-driven problem-solving, are required.
Benefits:
ZayZoon offers a fully remote work environment across Canada and the US.
The company is committed to reviewing every application and providing timely feedback to candidates.
Employees can expect a supportive hiring process, ensuring they are kept informed throughout.
Apply now
Please, let ZayZoon know you found this job
on RemoteYeah
.
This helps us grow 🌱.