Our US based client is looking for a mission-driven Site Reliability Engineer to support and scale the infrastructure powering their secure, mission-critical SaaS platform.
You must be confident in operating and debugging both modern infrastructure (cloud-native, containerized services) and classic Windows production environments (IIS, SQL Server AlwaysOn, Service Broker).
The role requires the ability to respond to incidents quickly, support ongoing automation, and scale systems reliably.
Responsibilities include being part of the team that owns the uptime and performance of core backend infrastructure (Windows + Linux).
You will maintain and enhance observability across systems using Kibana, CloudWatch, and custom telemetry.
The position involves managing CI/CD pipelines, infrastructure as code (Terraform, Ansible), and deployment automation.
You will support and maintain production Windows environments, including .NET Framework/Core apps running in IIS and SQL Server with AlwaysOn replication and Service Broker-based messaging.
Additionally, you will support and operate cloud-native services such as AWS Lambdas, DynamoDB, Postgres/Aurora, Redshift, Redis, and containerized workloads in Docker.
Participation in on-call rotation and incident response is required.
You will collaborate closely with engineering teams to improve system reliability and deployment workflows.
Requirements:
A minimum of 5+ years of SRE, DevOps, or WebOps experience supporting production SaaS systems is required.
Strong experience with Windows Server, IIS, and .NET applications in production is necessary.
Hands-on experience with SQL Server administration, including AlwaysOn and Service Broker, is essential.
Proficiency in AWS operations, including Lambda, DynamoDB, CloudWatch, and IAM, is required.
Familiarity with Postgres, Redis, Kibana/ElasticSearch, and centralized logging is expected.
Experience with Docker, Terraform, and Ansible for infrastructure management is necessary.
Strong scripting skills in PowerShell and Python are required.
Experience running and debugging containerized and distributed systems in production is essential.
Excellent incident response and debugging skills are a must.
Benefits:
The salary for this position is $6,000 USD per month.
Employees will receive holidays as part of their benefits.
The position offers unlimited PTO, allowing for flexible time off.