We are seeking an experienced Senior DevOps engineer to join our dynamic and innovative team.
As a DevOps engineer, you will play a key role in ensuring the reliability, availability, and performance of our systems and services.
You will work closely with cross-functional teams to build and maintain a robust and scalable infrastructure while championing best practices for reliability, automation, performance optimization, and monitoring and alerting.
You need advanced and/or fluent proficiency in English to communicate with different teams and clients during the workday.
Key responsibilities include leading incident response efforts, managing critical incidents to resolution, conducting post-incident analyses, and implementing preventive measures.
You will identify and address performance bottlenecks and optimize system performance to meet service-level objectives (SLOs) with the team.
Collaborate on capacity planning efforts to ensure that systems can handle current and future growth, and participate in capacity forecasting and resource allocation.
Develop and maintain infrastructure as code (IaC) using tools like Terraform and automate routine operational tasks to improve efficiency and reduce manual intervention.
Implement and enhance monitoring, alerting, and logging systems to proactively detect issues, conduct root cause analysis, and ensure system health.
Collaborate with development, operations, and other teams to bridge the gap between development and production environments, promoting a culture of collaboration to improve automation, efficiency, delivery, and software quality.
Maintain detailed documentation of systems, processes, and configurations, and contribute to knowledge sharing within the team.
Requirements:
Excellent communication skills, both written and verbal in English are required.
Proven experience in a similar DevOps or SRE role, with a strong focus on incident response, performance optimization, and automation is necessary.
Proficiency in at least one programming language (e.g., Python, Go, Java) for scripting and automation tasks is essential.
Experience with cloud computing platforms (the client uses Azure) and containerization technologies (e.g., Docker, Kubernetes) is required.
In-depth knowledge of infrastructure as code (IaC) principles and tools is necessary.
Strong expertise in implementing and managing monitoring and alerting solutions (e.g., Prometheus, Grafana, Datadog, ELK Stack) is required.
Excellent problem-solving and troubleshooting skills, with a deep understanding of system and network fundamentals are necessary.
Experience with Gitlab and/or Bitbucket and continuous integration/continuous deployment (CI/CD) pipelines (Jenkins + Groovy) is required.
Benefits:
Health and dental insurance are provided.
Meal and food allowance is included.
Childcare assistance is available.
Extended paternity leave is offered.
Partnership with gyms and health and wellness professionals via Wellhub (Gympass) TotalPass is available.
Profit Sharing and Results Participation (PLR) is included.
Life insurance is provided.
Access to a continuous learning platform (CI&T University) is available.
A discount club is offered.
Free online platform dedicated to physical, mental, and overall well-being is included.
Pregnancy and responsible parenting courses are available.
Partnerships with online learning platforms are provided.