The Site Reliability Engineer position is a remote role based in Canada, posted by Jobgether on behalf of Foundant Technologies.
The engineer will maintain the performance, scalability, and reliability of cloud-based SaaS products used by mission-driven organizations.
Responsibilities include collaborating with engineering, product, and operations teams to improve system health, automate processes, and ensure high availability.
The role involves ensuring high availability and optimal performance of production and staging environments by monitoring system health and responding to incidents promptly.
The engineer will lead incident response, conduct post-mortems, and implement corrective actions to prevent recurrence.
Automating repetitive tasks such as deployments, scaling, and maintenance using tools and scripts is a key responsibility.
Collaboration with cross-functional teams to manage cloud-based infrastructure changes and deploy new features seamlessly is required.
The engineer will develop and maintain robust monitoring, logging, and alerting systems for real-time visibility and proactive issue resolution.
Planning for system capacity and scalability based on usage trends and forecasted growth is essential.
The role includes applying security best practices across infrastructure and supporting compliance efforts in collaboration with security teams.
Continuous analysis and optimization of system performance, identifying bottlenecks, and enhancing throughput is expected.
Contributing to internal documentation and fostering knowledge-sharing to support operational excellence across teams is part of the job.
Requirements:
A minimum of 3 years of experience as a Site Reliability Engineer, DevOps Engineer, or in a similar role in a SaaS or cloud-based environment is required.
Expertise in Microsoft Azure services such as App Services, SQL, Functions, DevOps, and Azure Monitor is necessary.
Familiarity with AWS services like EC2, ECS, Lambda, and CloudWatch is a plus.
Strong knowledge of infrastructure-as-code tools like Terraform or CloudFormation is required.
Experience with Docker, Kubernetes, and orchestration frameworks is essential.
Proficiency in monitoring and APM tools such as Azure Monitor, Application Insights, Datadog, or ELK is needed.
Candidates must be skilled in scripting languages such as PowerShell, Python, or Bash.
An understanding of CI/CD pipelines, cloud networking, web application deployment, and database performance tuning is required.
Strong analytical and troubleshooting skills in distributed systems are necessary.
Clear and effective communication skills with the ability to explain technical concepts to diverse stakeholders are essential.
Candidates must be legally eligible to work in Canada.
Certifications such as Azure Solutions Architect Expert or AWS Solutions Architect are a plus.
Experience with large-scale distributed systems and cloud security practices is highly valued.
Benefits:
The position offers competitive compensation with performance-based reviews.
There is remote-first flexibility with optional access to office hubs.
A flexible PTO policy and a focus on work-life balance are provided.
Lifestyle reimbursements and well-being initiatives, including mindfulness and fitness support, are included.
Tuition reimbursement and a strong commitment to personal and professional development are offered.
Opportunities for cross-functional collaboration across a merged organization are available.
Internal mobility pathways for career growth and leadership opportunities are provided.
The company fosters a culture of autonomy, recognition, and mission-driven innovation.