Remote Staff Engineer - Site Reliability at Nagarro

Description:

The company is seeking a Staff Engineer specializing in Site Reliability to work remotely in Mexico.
The role involves being an experienced L3 SRE engineer responsible for a business-critical SaaS application.
The position requires the ability to handle L3 tasks across the full stack, including infrastructure backend and front-end, before escalating to the engineering business unit.
The candidate should be capable of automating SRE tools to offer proactive L3 support in alignment with the tech monitoring strategy.
The role demands working under business pressure for business-critical applications and effective communication with various stakeholders during troubleshooting.

Must have expertise in Kubernetes, Github Actions, Terraform, and AWS.
Strong communication skills are essential.
Prior experience in incident and problem management is required.
Familiarity with multitenant applications is necessary.
Solid understanding of networking concepts such as TCP/IP, DNS, Routing, VPCs, subnets, firewalls, load balancing, TLS, and SSL is crucial.
Proficiency in CI/CD pipelines (e.g., Jenkins, Github Actions) and version control is a must.
Knowledge of Python, react/next, monitoring, logging, Grafana, Prometheus, Loki, or ELK for analyzing resource utilization and application performance is needed.
Experience with AWS, especially EKS, serverless, queues, and various databases, is preferred.
Solid knowledge of Kubernetes is required.

Full-time position with the flexibility to work remotely from Mexico.
Opportunity to work for a Digital Product Engineering company that is scaling rapidly.
Chance to collaborate with a diverse team of over 19,000 experts across 33 countries.
Dynamic and non-hierarchical work culture.
Opportunity to work on products, services, and experiences that inspire, excite, and delight.