We are seeking an accomplished Infrastructure Engineer II (SRE) to enhance the reliability, scalability, and security of our mission-critical services.
The role involves designing and implementing robust infrastructure while mentoring colleagues across various technology domains.
You will report directly to the SRE Manager and participate in our 24×7 support structure to ensure maximum uptime and performance.
Responsibilities include overseeing the reliability, scalability, performance, and security of key production services from initial design to final implementation.
You will collaborate with cross-functional teams to develop and maintain resilient infrastructure.
Providing expert mentorship and guidance on best practices to engineers throughout the organization is a key aspect of the role.
You will contribute to our 24×7 on-call rotation to ensure uninterrupted availability of critical services.
Driving standardization and documentation efforts to promote efficiency, consistency, and knowledge sharing is also part of your duties.
Requirements:
Candidates must have 3–5 years of experience in SRE, DevOps, or Software Engineering roles.
Extensive expertise in Kubernetes at scale and strong knowledge of containerization is required.
Hands-on proficiency in Infrastructure as Code (IaC) tools such as Ansible, Puppet, and Terraform is essential.
Strong coding skills in an Object-Oriented Programming (OOP) language and the ability to develop effective scripting solutions are necessary.
A deep understanding of security knowledge and best practices across all aspects of infrastructure and services is required.
Expert-level experience with cloud platforms such as AWS and/or GCP is mandatory.
Advanced monitoring and alerting skills using tools like Prometheus, Grafana, or similar are needed.
A solid understanding of networking fundamentals is required.
Robust experience in Linux or Windows administration is essential.
Familiarity with software delivery automation (CI/CD, SDLC) and static/dynamic application security testing is necessary.
Comprehensive knowledge of SRE principles (SLI, SLO, SLA, Toil, Uptime, Observability) is required.
Experience with managing and scaling Elasticsearch is strongly encouraged for this role.
Benefits:
We offer a friendly and welcoming environment focused on people, learning, and development.
Employees receive 25 vacation days, with additional vacation days granted after age milestones and after having children.
A cafeteria benefit via SZEP card is provided.
Medicover private health insurance is available for employees and their family members.
The work week is 37.5 hours or less.
Employees are allowed to spend 10% of their time on personal projects, reading groups, or tech talks.
Flexible working arrangements and the option to work from home are available.
An extensive people development program, including access to Udemy, is offered.