Gauss Labs is seeking a highly skilled Site Reliability Engineer to join their team in Vancouver.
The SRE will play a critical role in ensuring the reliability, performance, and scalability of the industrial AI platform.
Responsibilities include building and maintaining a robust solution that supports the growing business at customer sites.
The role requires a high level of technical expertise, a collaborative mindset, and a strong desire to continuously improve systems and processes.
Key responsibilities include creating and maintaining robust monitoring systems, participating in on-call rotations, developing automation tools, forecasting resource needs, optimizing system performance, collaborating with various teams, focusing on customer satisfaction, and driving continuous improvement.
Requirements:
A Bachelor's degree in computer science, engineering, or a related discipline is required.
Candidates must have 5+ years of industry experience as a Site Reliability Engineer.
Experience with cloud platforms such as AWS, GCP, or Azure is necessary.
Proficiency in containerization technologies like Docker and Kubernetes is required.
Familiarity with observability and alerting tools such as Prometheus, Grafana, ElasticSearch, and Jaeger is essential.
Candidates should have experience with scripting languages, specifically Python and Bash.
A working knowledge of Github, Github actions, and CI/CD concepts is required.
Experience in ticket management, issue resolution, and troubleshooting is necessary.
Strong problem-solving and troubleshooting skills are essential.
Excellent customer communication and interpersonal skills, with fluency in verbal and written English, are required.
Benefits:
The position offers the opportunity to work in a dynamic and innovative environment focused on industrial AI.
Employees will have the chance to collaborate with talented professionals across various disciplines.
The role provides opportunities for continuous learning and professional development.
Gauss Labs promotes a culture of continuous improvement, allowing employees to contribute to enhancing system reliability and performance.
The hiring process includes an application review, phone interview, virtual onsite interview, and a VP/Core Value interview, ensuring a thorough selection process.