Remote Site Reliability Engineer - (Remote - Canada)

Posted

Apply now
Please, let Jobgether know you found this job on RemoteYeah. This helps us grow 🌱.

Description:

  • As a Site Reliability Engineer (SRE), you will play a key role in designing, implementing, and maintaining scalable infrastructure while ensuring system reliability and efficiency.
  • Your focus will be on automation, performance optimization, and cloud resource management.
  • You will collaborate with cross-functional teams to streamline CI/CD pipelines, enhance monitoring solutions, and support a highly available infrastructure.
  • This position requires a proactive approach to troubleshooting and continuous improvement, ensuring seamless integration of new services while leveraging the latest SRE best practices.
  • You will design, build, and maintain highly scalable cloud infrastructure using Terraform and Terragrunt for automated resource provisioning.
  • You will manage and optimize AWS cloud environments, ensuring security, cost efficiency, and high availability.
  • You will oversee data streaming platforms using Confluent Cloud and Kafka, ensuring reliable data pipelines.
  • You will deploy and manage Redis instances for caching and real-time data processing.
  • You will implement and maintain monitoring and alerting solutions using Prometheus, Grafana, Alert Manager, and OpsGenie.
  • You will enable feature flag management and controlled rollouts with LaunchDarkly.
  • You will manage Kubernetes clusters, utilizing Helm, ArgoCD, Istio, and Kustomize for continuous deployment and infrastructure-as-code practices.
  • You will collaborate with development teams to integrate new services into the infrastructure seamlessly.
  • You will troubleshoot complex system issues to maintain high availability and performance.
  • You will continuously improve automation tools, processes, and methodologies to enhance system scalability.

Requirements:

  • You must have 4+ years of experience in Site Reliability Engineering or a similar role with a strong focus on cloud infrastructure.
  • You should have expertise in Infrastructure as Code (IaC) using Terraform and Terragrunt.
  • You need deep knowledge of AWS cloud services and best practices for scalable and secure architectures.
  • You must have hands-on experience with Confluent Cloud and Kafka for distributed data streaming.
  • Strong experience with Redis for caching and RDS for data storage is required.
  • Proficiency with OpenSearch/ElasticSearch/ChaosSearch for search and analytics is necessary.
  • You should have advanced knowledge of monitoring tools like Prometheus, Grafana, Alert Manager, and OpsGenie.
  • Experience with LaunchDarkly for feature flag management is essential.
  • Extensive experience managing Kubernetes clusters, including Helm for package management, ArgoCD for deployments, and Istio for service mesh configurations is required.
  • Familiarity with Kustomize for Kubernetes resource configuration is necessary.
  • You must possess strong problem-solving skills and the ability to troubleshoot complex systems in production environments.
  • Excellent communication and collaboration skills within agile teams are required.

Benefits:

  • You will receive a competitive salary based on experience and qualifications.
  • The position offers fully remote work flexibility, with a collaborative team environment.
  • Comprehensive healthcare coverage, including medical, dental, and vision plans, is provided.
  • A retirement savings plan with company matching is available.
  • Flexible paid time off (PTO) is offered to support work-life balance.
  • Professional development opportunities, including training and certifications, are provided.
  • You will have access to cutting-edge technology and opportunities to work on innovative projects.
Apply now
Please, let Jobgether know you found this job on RemoteYeah . This helps us grow 🌱.
About the job
Report this job

Job expired or something else is wrong with this job?

Report this job
Leave a feedback