Remote Site Reliability Engineer at Suzy

Description:

Suzy is hiring a Site Reliability Engineer to ensure the reliability, scalability, and performance of software systems and infrastructure.
The role involves bridging the gap between software development and operations by applying engineering practices to IT operations.
Responsibilities include using automation and monitoring tools to maintain system health and improve the availability and uptime of services.
The engineer will identify and resolve incidents and implement solutions to prevent future issues.
Additional focus areas include optimizing system performance, capacity planning, and creating systems that handle operational workloads efficiently.
SREs will monitor systems to ensure they are running smoothly and intervene when issues arise.
They will develop and maintain automation tools and processes to reduce manual work and improve system efficiency.
Managing infrastructure, including servers, networks, databases, and storage systems, is a key responsibility.
Incident management involves responding to and resolving system outages or performance issues.
SREs will analyze system usage patterns and develop capacity plans to handle expected traffic and usage.
Continuous improvement of system reliability, scalability, and performance is essential, including analyzing data to identify areas for improvement.
Collaboration with development teams, operations teams, and other stakeholders is necessary to ensure systems meet business needs.

Requirements:

Candidates should have an understanding of relational and NoSQL databases, including replication, scaling, and backup strategies.
Knowledge of designing and managing virtual networks, subnets, and network security groups is required.
Experience in configuring and managing load balancing and traffic routing is preferred.
Expertise in infrastructure-as-code (IaC) for provisioning resources, with Bicep/ARM being a plus, is necessary.
Candidates must be able to manage and scale containerized applications using Kubernetes.
Experience in monitoring the performance, availability, and health of applications and infrastructure is essential.
Candidates should be able to collect and analyze log data from various resources for troubleshooting and insights.
Familiarity with Prometheus and Grafana for enhanced monitoring, alerting, and observability is required.
Understanding of secret management, key encryption, and certificate management is necessary.
Candidates should understand cloud infrastructure and application performance, including optimizing SQL queries, storage, and compute resources.
Experience in backup strategies and disaster recovery solutions is required.
The ability to diagnose incidents and identify root causes for system failures or performance degradation is essential.
Candidates must be able to respond to critical incidents and participate in on-call rotations to ensure system availability.
Automating resource management and day-to-day tasks using scripting languages like Python, PowerShell, and Azure CLI is necessary.
Writing infrastructure-as-code to deploy and manage resources consistently and reliably is required.
Expertise in managing containers and container orchestration with Kubernetes is essential.
Required skills include understanding of most Azure resources and related services, core AWS resources, Kubernetes, Infrastructure-as-Code using ARM Templates and Bicep, Grafana, Elasticsearch, C# .Net, Python, Networking, DevOps/Github, Azure CLI, SQL, Mongo, Bicep, and Bash/Shell Scripting.

Benefits:

Suzy offers generous health, dental, and vision benefits, and the 401K plan vests immediately.
Employees enjoy a friendly, fun, and collaborative work environment with frequent exposure to executives.
The opportunity to make an immediate impact as part of a fast-growing company is available.
The target base salary for this role is between $110,500 and $130,000.