This job post is closed and the position is probably filled. Please do not apply.
🤖 Automatically closed by a robot after apply link
was detected as broken.
Description:
The company is seeking a Staff Engineer specializing in Site Reliability to work remotely in Mexico.
The role involves being an experienced L3 SRE engineer responsible for a business-critical SaaS application.
The position requires the ability to handle L3 tasks across the full stack, including infrastructure backend and front-end, before escalating to the engineering business unit.
The candidate should be capable of automating SRE tools to offer proactive L3 support in alignment with the tech monitoring strategy.
The role involves working under business pressure for business-critical applications and effectively communicating with various stakeholders during troubleshooting.
Requirements:
Must have expertise in Kubernetes, Github Actions, Terraform, and AWS.
Strong communication skills are essential for effective collaboration.
Prior experience with incident and problem management is required.
Familiarity with multitenant applications is necessary.
Solid understanding of networking concepts such as TCP/IP, DNS, Routing, VPCs, subnets, firewalls, load balancing, TLS, and SSL is crucial.
Experience with CI/CD pipelines (e.g., Jenkins, Github Actions) and version control is a must.
Proficiency in Python, react/next, monitoring, logging, Grafana, Prometheus, Loki, or ELK is required.
Experience with AWS, especially EKS, serverless, queues, and various databases, is preferred.
Solid knowledge of Kubernetes is essential for the role.
Benefits:
Full-time position with the flexibility to work remotely from Mexico.
Opportunity to work for a Digital Product Engineering company that is rapidly scaling.
Chance to collaborate with a diverse team of over 19,000 experts across 33 countries.
Dynamic and non-hierarchical work culture.
Opportunity to work on products, services, and experiences that inspire, excite, and delight.