Nexthink is seeking a strong Platform Engineer with SRE operations experience to enhance their infrastructure and improve deployment, monitoring, and scaling of systems.
The role is crucial for ensuring a seamless, reliable, and scalable experience for customers 24/7 as a SaaS provider.
Responsibilities include designing, building, and maintaining the infrastructure for a multi-tenant SaaS platform with a focus on reliability, security, and scalability.
The engineer will implement and manage cloud-native systems on AWS using top-tier tools and automation.
Operating and enhancing Kubernetes clusters, deployment pipelines, and service meshes to support continuous delivery is essential.
The role involves establishing and enforcing SLOs, SLAs, and error budgets while proactively addressing availability and performance issues.
Developing infrastructure as code using Terraform or similar tools for repeatable and auditable provisioning is required.
The engineer will program solutions for Platform Tools for automation, monitoring, and provisioning.
A solid understanding of the network stack, cloud topologies, and storage solutions is necessary.
Monitoring system health and application performance using tools like Datadog, Prometheus, and Grafana is part of the job.
The engineer will improve incident response practices and reduce mean time to detect (MTTD) and recover (MTTR).
Troubleshooting incidents with minimal intervention from other functions is expected.
Participation in a shared on-call rotation to respond to incidents and troubleshoot outages is required.
Collaboration with software engineers to embed reliability and observability into services is essential.
Developing automated runbooks, health checks, and alerting to support reliable operations is part of the role.
Supporting automated testing, canary deployments, and rollback strategies for safe and reliable releases is necessary.
Contributing to security best practices, compliance automation, and cost optimization is expected.
Requirements:
A minimum of a BS in Computer Science or Engineering is required.
At least 5 years of experience in an SRE/platform engineering role supporting SaaS platforms is necessary.
Strong hands-on experience with public cloud services such as AWS, GCP, or Azure is required.
Proficiency with Kubernetes, container-based deployment, and related ecosystems is essential.
Strong programming or scripting skills in languages such as Python, Go, or Bash are required.
Experience with CI/CD pipelines, including tools like GitHub Actions, GitLab CI, or ArgoCD, is necessary.
Familiarity with observability stacks such as Prometheus, ELK/EFK, or Datadog is required.
Comfort with being part of a rotating on-call schedule, including handling critical incidents, is necessary.
Strong system-level troubleshooting skills and a proactive mindset toward incident prevention are essential.
A deep understanding of Linux systems, networking, and common troubleshooting practices is required.
Experience supporting multi-tenant microservices architectures is necessary.
Familiarity with service mesh technologies, such as Istio, is preferred.
Knowledge of zero-downtime deployment strategies, including blue/green and canary releases, is required.
Exposure to compliance standards such as SOC 2, ISO 27001, or HIPAA is preferred, with FedRAMP experience being a plus.
Experience with chaos engineering or resilience testing practices is beneficial.
Benefits:
Employees enjoy flexible hours and unlimited vacation, including 15 days of holidays, 11 company-paid holidays, and 3 extra days for volunteering.
The company offers a hybrid work model that balances office and remote work, with structured onboarding to foster connections and team integration.
Free access to professional training platforms is provided to explore interests and enhance skills.
Up to 16 weeks of paid leave for birthing parents/primary caregivers and 6 weeks for secondary caregivers is available.
A 401(k) plan with up to 4% company matching contributions is offered to help employees grow their retirement savings.
Bonuses are available for referring successful hires after three months of continuous employment.
Comprehensive benefits include 100% covered health, dental, and vision insurance, as well as access to life insurance, long-term disability, and accidental death/personal loss coverage.