Remote Site Reliability Engineer

at Sur

Posted 1 day ago 1 applied

Description:

  • Our US based client is looking for a mission-driven Site Reliability Engineer to support and scale the infrastructure powering their secure, mission-critical SaaS platform.
  • You must be confident in operating and debugging both modern infrastructure (cloud-native, containerized services) and classic Windows production environments (IIS, SQL Server AlwaysOn, Service Broker).
  • The role requires the ability to respond to incidents quickly, support ongoing automation, and scale systems reliably.
  • Responsibilities include being part of the team that owns the uptime and performance of core backend infrastructure (Windows + Linux).
  • You will maintain and enhance observability across systems using Kibana, CloudWatch, and custom telemetry.
  • The position involves managing CI/CD pipelines, infrastructure as code (Terraform, Ansible), and deployment automation.
  • You will support and maintain production Windows environments, including .NET Framework/Core apps running in IIS and SQL Server with AlwaysOn replication and Service Broker-based messaging.
  • Additionally, you will support and operate cloud-native services such as AWS Lambdas, DynamoDB, Postgres/Aurora, Redshift, Redis, and containerized workloads in Docker.
  • Participation in on-call rotation and incident response is required.
  • You will collaborate closely with engineering teams to improve system reliability and deployment workflows.

Requirements:

  • A minimum of 5+ years of SRE, DevOps, or WebOps experience supporting production SaaS systems is required.
  • Strong experience with Windows Server, IIS, and .NET applications in production is necessary.
  • Hands-on experience with SQL Server administration, including AlwaysOn and Service Broker, is essential.
  • Proficiency in AWS operations, including Lambda, DynamoDB, CloudWatch, and IAM, is required.
  • Familiarity with Postgres, Redis, Kibana/ElasticSearch, and centralized logging is expected.
  • Experience with Docker, Terraform, and Ansible for infrastructure management is necessary.
  • Strong scripting skills in PowerShell and Python are required.
  • Experience running and debugging containerized and distributed systems in production is essential.
  • Excellent incident response and debugging skills are a must.

Benefits:

  • The salary for this position is $6,000 USD per month.
  • Employees will receive holidays as part of their benefits.
  • The position offers unlimited PTO, allowing for flexible time off.