Please, let Capital Markets Gateway know you found this job
on RemoteYeah.
This helps us grow 🌱.
Description:
CMG is seeking a Site Reliability Engineer (SRE) focused on monitoring, observability, and alerting to ensure the reliability, performance, and scalability of infrastructure and applications.
The SRE will design, implement, and maintain monitoring solutions to provide visibility into system health and performance, proactively detect anomalies, and reduce incident response time.
Key responsibilities include designing and maintaining monitoring and observability solutions using tools like Prometheus, Grafana, Datadog, and OpenTelemetry.
The role involves defining and implementing SLOs, SLIs, and error budgets to measure system reliability, as well as developing dashboards, alerts, and reports for system performance and business metrics.
The SRE will design actionable alerting strategies, integrate alerting systems with Jira, and establish runbooks for on-call teams.
Performance optimization tasks include analyzing system performance metrics, identifying bottlenecks, and conducting load testing and capacity planning.
The role requires identifying opportunities for automation and developing tools to streamline operational processes, including monitoring and alerting systems.
Collaboration with cross-functional teams is essential to understand system requirements and provide technical guidance.
Requirements:
Candidates must be based in Latin America.
A high level of English proficiency (C1 or C2) is required.
Proven experience as a Site Reliability Engineer or in a similar role is necessary.
Proficiency in logging, metrics, and tracing frameworks such as DataDog, Loki, Prometheus, and OpenTelemetry is required.
Experience with cloud platforms, preferably Azure, and infrastructure-as-code tools like Terraform is essential.
Strong programming and scripting skills in Python and Bash are required.
Proficiency in containerization technologies and orchestration tools, including Docker and Kubernetes, is necessary.
A solid understanding of Linux-based systems, networking, and security principles related to containerized applications is required.
Strong problem-solving and troubleshooting skills are essential, along with a passion for resolving complex technical issues.
Excellent communication and collaboration abilities are necessary.
The ability to thrive in a fast-paced, constantly evolving environment is required.
Experience with PostgreSQL monitoring and optimization is a nice-to-have.
Benefits:
The position offers a 2-year contract.
Employees receive 15 days of vacation.
Opportunities for tech courses and conferences are provided.
A top-of-the-line MacBook is offered for work purposes.
Flexible working hours are available to accommodate work-life balance.
Apply now
Please, let Capital Markets Gateway know you found this job
on RemoteYeah
.
This helps us grow 🌱.