This job post is closed and the position is probably filled. Please do not apply.
🤖 Automatically closed by a robot after apply link
was detected as broken.
Description:
The company is seeking a Staff Engineer specializing in Site Reliability to work remotely in Colombia for a full-time position in the UCC service region.
The role involves being an experienced L3 SRE engineer responsible for a business-critical SaaS application.
The candidate should have the capacity to work across the full stack, including infrastructure backend and front-end, before escalating issues to the engineering business unit.
Automation of SRE tools to provide proactive L3 support is essential, aligning with the tech monitoring strategy.
The candidate must be able to work under business pressure for business-critical applications and effectively communicate with various stakeholders during troubleshooting.
Requirements:
Must have expertise in Kubernetes, Github Actions, Terraform, and AWS.
Strong communication skills are required.
Experience with incident and problem management is necessary.
Familiarity with multitenant applications is essential.
Solid understanding of networking concepts such as TCP/IP, DNS, Routing, VPCs, subnets, firewalls, load balancing, TLS, and SSL is required.
Proficiency in CI/CD pipelines (e.g., Jenkins, Github Actions) and version control is necessary.
Knowledge of Python, react/next, monitoring, logging, Grafana, Prometheus, Loki, or ELK is essential.
Experience with AWS, especially EKS, serverless, queues, and various databases, is required.
Solid knowledge of Kubernetes is a must.
Benefits:
Opportunity to work for a Digital Product Engineering company that is scaling rapidly.
Remote work option available for employees.
Dynamic and non-hierarchical work culture.
Chance to collaborate with a global team of 19000+ experts across 33 countries.
Exciting projects that inspire, excite, and delight.
Room for professional growth and development within the company.