This job post is closed and the position is probably filled. Please do not apply.
π€ Automatically closed by a robot after apply link
was detected as broken.
Description:
As a Site Reliability Engineer at Echo360, you will ensure the reliability, scalability, cost, and security of our cloud infrastructure.
You will proactively prevent incidents and maintain adherence to SLOs and SLAs.
Your responsibilities include designing and implementing automated monitoring and alerting systems to detect potential issues early.
You will collaborate with development teams to ensure seamless deployments and rollbacks.
Conducting failure testing to enhance system resilience will be part of your role.
You will optimize performance and automate infrastructure provisioning using Terraform and CI/CD pipelines.
Enforcing security best practices, IAM policies, and secrets management is essential.
You will engage in incident response, post-mortem analysis, and continuous improvement initiatives.
Mentoring junior team members and staying updated on emerging technologies and best practices in site reliability engineering is expected.
Experience with monitoring tools like CloudWatch, DataDog, Prometheus, and Grafana is required.
You will help drive a culture of automation and efficiency in a fast-paced, agile environment.
Requirements:
You must have 5+ years of experience as a Site Reliability Engineer or in a similar role.
A strong understanding of AWS cloud services, including DynamoDB, MySQL, S3, CloudSearch, OpenSearch, Kafka, Presto, EKS, ECS, and EC2 is necessary.
Experience with infrastructure automation tools like Ansible, Terraform, or CloudFormation is required.
You should have experience with monitoring and alerting tools such as CloudWatch, DataDog, Prometheus, Grafana, Kibana, and PagerDuty.
Familiarity with GitHub actions, CI/CD pipelines, and deployment strategies is essential.
Strong problem-solving and analytical skills are a must.
Excellent communication and collaboration skills are required.
You should be able to work independently and take ownership of complex tasks.
A passion for technology and a desire to learn and grow is important.
Experience with Jenkins, PostgreSQL, and MongoDB is preferred.
Knowledge of cloud cost optimization, security best practices, and tools is necessary.
Experience working in a fast-paced, agile environment is required.
Familiarity with Rancher, Cattleprod, and TeamCity is a plus.
Benefits:
Echo360 offers comprehensive benefits including medical, dental, vision, life, and disability insurance.
A 401(k) plan with company match is provided.
The company has an unlimited PTO policy.
Echo360 promotes a diverse and inclusive workplace, ensuring equal employment opportunities for all.