Remote Senior Site Reliability Engineer

at Nexthink

Posted 2 days ago 2 applied

Description:

  • Nexthink is a leader in digital employee experience management software, providing IT leaders with insights to diagnose and fix issues impacting employees.
  • The company is seeking a Senior Site Reliability Engineer to enhance infrastructure and improve deployment, monitoring, and scaling of systems.
  • The SRE team collaborates with over 50 Product Engineering teams and other technical teams to understand reliability requirements and implement solutions.
  • Responsibilities include managing cloud-native systems, operating Kubernetes clusters, designing infrastructure for a multi-tenant SaaS platform, and defining SLOs and SLAs.
  • The role involves developing infrastructure-as-code, building internal tools for operational efficiency, monitoring applications, and participating in on-call rotations.
  • The engineer will act as an Incident Commander, drive incident response processes, and work closely with software engineers to embed reliability principles into service design.

Requirements:

  • A minimum Bachelor’s degree in Computer Science or equivalent practical experience is required.
  • Candidates should have 5+ years of experience as a Site Reliability Engineer or Platform Engineer with strong knowledge of software development best practices.
  • Strong hands-on experience with public cloud services (AWS, GCP, Azure) and supporting SaaS products is necessary.
  • Proficiency in programming or scripting languages (e.g., Python, Go, Bash) and experience with infrastructure-as-code tools (e.g., Terraform) is required.
  • Candidates must have experience with Kubernetes, container-based deployment, and multi-tenant microservices architectures.
  • Familiarity with CI/CD pipelines and tools, as well as managing monitoring solutions, is essential.
  • Comfort with participating in a rotating on-call schedule and managing critical incidents is expected.
  • Strong system-level troubleshooting skills and a proactive mindset toward incident prevention are necessary.
  • A deep understanding of Linux systems, networking, and common troubleshooting practices is required.
  • Knowledge of zero-downtime deployment strategies and exposure to compliance standards is a plus.
  • Excellent problem-solving skills, a collaborative mindset, and strong communication skills in English are essential.

Benefits:

  • The position offers a permanent contract and a competitive compensation package, including stock options.
  • Employees enjoy private health insurance and daily meal vouchers fully covered by the company.
  • A hybrid work model is available, balancing office and remote work, with structured onboarding for new hires.
  • Flexible hours and unlimited vacation are provided, along with three company-paid volunteer days.
  • A gym subscription reimbursement of up to 25 EUR per month is included.
  • The company offers a flexible retribution plan for kindergarten and transport tickets.
  • Reimbursement of up to 50% for English and Spanish classes is available.
  • Employees can enjoy fresh fruit, cookies, and occasional soft drinks at the office.
  • Regular company and team events, such as team-building activities and Christmas parties, are organized.
  • Bonuses are provided for referring successful hires after three months of continuous employment.
  • A relocation package is offered for candidates moving from another country.