Please, let DefenseStorm know you found this job
on RemoteYeah.
This helps us grow π±.
Description:
As a Site Reliability Engineer at DefenseStorm, you will play a crucial role in ensuring the reliability, scalability, and performance of our cloud-based services.
You will work on GRID, a high-throughput, data-intensive application that currently handles 250k events per second.
Your responsibilities will include leading the migration of EC2 workloads to ECS and developing DevOps tooling to empower development teams in building and managing containerized applications.
You will advance zero trust security initiatives by implementing a service mesh architecture with technologies such as Istio.
Enhancing the security, scalability, and reliability of AWS cloud-native infrastructure through continuous improvement and innovation will be part of your role.
You will design and implement proactive monitoring and alerting solutions using tools like Prometheus, Grafana, and OpsGenie, leveraging data-driven insights to optimize uptime and mitigate operational risks.
Upholding SLAs and SLOs by applying SRE best practices, including incident response, post-mortem analysis, and the creation of operational playbooks, will be essential.
You will build, manage, and scale cloud infrastructure using Infrastructure as Code (IaC) tools such as Terraform.
Supporting SOC 2 and ISO compliance efforts by championing security best practices, streamlining evidence collection, and introducing automation to improve audit processes will also be part of your duties.
Other duties may be assigned by management as needed.
Requirements:
You must have hands-on experience building and maintaining CI/CD pipelines using tools such as GitHub Actions.
A strong understanding of networking principles and their application in cloud and containerized environments is required.
Proven experience designing, building, and managing cloud infrastructure in AWS is essential.
You should have expertise with Infrastructure as Code (IaC) and deployment automation tools to streamline environment provisioning and management.
Experience running and supporting containerized workloads in production environments is necessary.
Familiarity with observability, monitoring, logging, and tracing tools to ensure system performance, reliability, and visibility is required.
You should have experience using AWS, ECS, Elasticsearch, PostgreSQL, Prometheus, Grafana, GitHub Actions, and Terraform.
Benefits:
DefenseStorm provides equal employment opportunities to all employees and applicants for employment.
The company prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state, or local laws.
Apply now
Please, let DefenseStorm know you found this job
on RemoteYeah
.
This helps us grow π±.