Remote Site Reliability Engineering (SRE) Engineer

Posted

Apply now
Please, let dLocal know you found this job on RemoteYeah. This helps us grow 🌱.

Description:

  • dLocal enables the biggest companies in the world to collect payments in 40 countries in emerging markets.
  • Global brands rely on dLocal to increase conversion rates and simplify payment expansion effortlessly.
  • As a Site Reliability Engineering (SRE) Engineer, you will focus on the design and implementation of systems that are highly resilient, scalable, and reliable.
  • You will work on mission-critical applications with big customers like Netflix, Amazon, Nike, and Facebook.
  • Your responsibilities will include developing quality gates based on production-level service level objectives (SLOs) to detect issues earlier in the development cycle.
  • You will automate build testing and validation using service-level indicators (SLIs) and SLOs.
  • You will influence architectural decisions during initial design stages to ensure resiliency and scale at the outset of software development.
  • You will design processes, playbooks, and checklists for other engineers to follow during and after incidents.
  • Writing post mortems and performing technical after-action reviews to understand root causes and propose system improvements will be part of your role.
  • You will interact with members from almost all teams across the business to understand their monitoring, alerting, and SLO/SLA requirements.
  • Automating the provisioning of monitoring tools and rules with tools like Terraform and Ansible/Chef will be required.
  • You will design base level requirements for new and existing services to ensure consistent and accurate monitoring of dLocal infrastructure and code.
  • Monitoring both the technical health and security health of dLocal infrastructure and systems will be essential.
  • You will optimize the signal-to-noise ratio for alerting to ensure only actionable alerts are received.

Requirements:

  • You must have over 3 years of experience as an SRE Engineer or in a very similar role.
  • Experience with monitoring tools such as New Relic, DataDog, and Nagios is required.
  • You should have experience working with tools such as Jira, PagerDuty, and Confluence, and integrating these tools with automated processing techniques (API integrations).
  • Familiarity with CI/CD tools such as Github Actions, Jenkins, Spinnaker, ArgoCD, or similar is necessary.
  • Knowledge of security best practices and infosec tooling is essential, as you will be writing systems to monitor for breaches and insecurities.
  • Strong communication skills are required.
  • You should possess problem-solving skills and be detail-oriented.
  • A highly analytical mindset is necessary.
  • The ability to collaborate across multi-functional teams is important.
  • Cloud experience, particularly with AWS, is highly advantageous as most systems will integrate with AWS at some level.
  • Experience with Infrastructure as Code (IaC) using a tool like Terraform is highly advantageous.
  • Configuration as Code (CaC) experience with a tool like Ansible, Chef, or Salt is highly advantageous.
  • Database knowledge, particularly in terms of performance and SQL syntax, is highly advantageous.

Benefits:

  • You will be part of a flexible, remote-first dynamic culture with travel, health, and learning benefits.
  • Working with a global team of 900+ teammates from 25+ different nationalities will provide an international career development opportunity.
  • You will have the chance to impact millions of people's daily lives through your work.
  • dLocal fosters a culture of building and facing challenges, which will allow you to thrive in your role.
Apply now
Please, let dLocal know you found this job on RemoteYeah . This helps us grow 🌱.
About the job
Posted on
Job type
Salary
-
Report this job

Job expired or something else is wrong with this job?

Report this job
Leave a feedback