This job post is closed and the position is probably filled. Please do not apply.
π€ Automatically closed by a robot after apply link
was detected as broken.
Description:
Monitor and troubleshoot production incidents proactively, identifying and resolving issues quickly and efficiently.
Implement automated monitoring and alerting systems for early detection of potential problems.
Collaborate with development teams to perform deployments and rollbacks with minimal disruption.
Optimize the performance and scalability of AWS infrastructure, including DynamoDB, MySQL, S3, CloudSearch, OpenSearch, Kafka, Presto, SES, and E2.
Write and maintain infrastructure code using Terraform and scripts to automate tasks and improve operational efficiency.
Proactively identify and address potential security vulnerabilities.
Participate in incident response and post-mortem analysis activities to identify root causes and prevent future occurrences.
Help onboard and mentor junior team members, sharing knowledge and expertise.
Stay up to date on the latest cloud technologies and best practices for SRE.
Participate in a low-volume on-call rotation with other Site Reliability Engineers.
Explore new technologies and innovative solutions to improve service quality and speed to market.
Participate in technical discussions and deep dives with other engineering and product teams.
Requirements:
5+ years of experience as a Site Reliability Engineer or similar role.
Strong understanding of AWS cloud services, including DynamoDB, MySQL, S3, CloudSearch, OpenSearch, Kafka, Presto, SES, and E2.
Experience with infrastructure automation tools like Ansible, Terraform, or CloudFormation.
Experience with monitoring and alerting tools like DataDog, Prometheus, Grafana, Kibana, and PagerDuty.
Experience with Cl/CD pipelines and deployment strategies.
Strong problem-solving and analytical skills.
Excellent communication and collaboration skills.
Ability to work independently and take ownership of complex tasks.
Passion for technology and a desire to learn and grow.
Experience with Rancher, Cattleprod, Jenkins, TeamCity, PostgreSQL, and MongoDB.
Experience with security best practices and tools.
Experience working in a fast-paced, agile environment.
Benefits:
The base salary range for this position is $120,000 - $140,000 annually.
Comprehensive benefits including medical, dental, vision, life & disability insurance, a 401(k) plan with company match, and an unlimited PTO policy.