Remote Senior Site Reliability Engineer

Description:

The Senior Site Reliability Engineer will be responsible for developing and maintaining advanced observability solutions within the SRE Cloud Space team.
Responsibilities include leading efforts in blackbox monitoring, implementing synthetic tests, improving platform reliability, and developing anomaly detection mechanisms.
The role involves working on both on-premise and Google Cloud Platform (GCP) environments using cutting-edge technologies like Prometheus, Dynatrace, and OpenShift.
Collaboration with various teams for effective incident management and response is essential.

Bachelor's degree in Computer Science, Engineering, or equivalent experience.
3+ years of experience in DevOps and Site Reliability Engineering, focusing on automation, infrastructure as code, and CI/CD practices.
3+ years of programming experience with a strong emphasis on Golang development.
Proficiency in monitoring tools such as Dynatrace, Prometheus, ELK, Splunk, or similar.
Experience with Google Cloud Platform (GCP) and on-premise environments, particularly with OpenShift.
Familiarity with container orchestration technologies like Kubernetes (K8s) and OpenShift.
Demonstrable experience in designing scalable and resilient systems with cloud-native principles.

Full-time position with a competitive salary.
Opportunity to work remotely from Colombia or Costa Rica.
Utilization of cutting-edge technologies and tools like Prometheus, Dynatrace, and OpenShift.
Collaboration with various teams for effective incident management and resolution strategies.
Continuous learning and development in the field of observability and site reliability engineering.