Remote Staff Engineer - Site Reliability Engineer at Nagarro

Description:

The company is seeking a Staff Engineer - Site Reliability Engineer to work remotely in Mexico.
The role involves being an experienced L3 SRE engineer responsible for a business-critical SaaS application.
The engineer must have the capacity to work across the full stack, including infra backend and front-end, before escalating issues to the engineering business unit.
Automation of SRE tools is required to provide proactive L3 support in alignment with the tech monitoring strategy.
The engineer should be able to work under business pressure for business-critical applications and effectively communicate with various stakeholders during troubleshooting.

Must have expertise in Kubernetes, Github Actions, Terraform, and AWS.
Strong communication skills are essential.
Experience with incident and problem management is required.
Familiarity with multitenant applications is necessary.
Solid understanding of networking concepts such as TCP/IP, DNS, Routing, VPCs, subnets, firewalls, load balancing, TLS, and SSL is crucial.
Proficiency in CI/CD pipelines (e.g., Jenkins, Github Actions) and version control is needed.
Knowledge of Python, react/next, monitoring, logging, Grafana, Prometheus, Loki, or ELK is preferred.
Experience with AWS services, especially EKS, serverless, queues, and various databases, is a plus.
Solid knowledge of Kubernetes is required.

Full-time remote position in Mexico.
Opportunity to work for a Digital Product Engineering company that is scaling rapidly.
Dynamic and non-hierarchical work culture.
Chance to collaborate with a global team of over 19,000 experts across 33 countries.
Competitive salary and benefits package.
Opportunity to work on business-critical applications and gain valuable experience in SRE engineering.