Remote Site Reliability Engineer (Internal Engineering) (Remote)

Posted

Apply now
Please, let KnowBe4 know you found this job on RemoteYeah. This helps us grow 🌱.

Description:

  • The Internal Site Reliability Engineer (SRE) ensures the reliability, scalability, and performance of internal systems and infrastructure.
  • This role involves monitoring, automation, incident management, and maintaining self-hosted platforms to support smooth development operations.
  • The Internal SRE works closely with cross-functional teams to manage GitLab CI/CD workflows and cloud infrastructure on AWS.
  • The position emphasizes proactive problem-solving, automation, and collaboration to continuously improve system stability and efficiency.
  • Responsibilities include managing and maintaining GitLab environments to ensure high availability and security.
  • The SRE will design and implement CI/CD pipelines to automate software delivery.
  • Monitoring and troubleshooting system performance issues using observability tools like Prometheus, Grafana, or Datadog is required.
  • Collaboration with development teams to align infrastructure efforts with project needs and timelines is essential.
  • The role involves building and maintaining infrastructure as code (IaC) solutions using tools like Terraform and Ansible.
  • Managing AWS services, including ECS, S3, API Gateway, DynamoDB, RDS, IAM, and VPC, is part of the job.
  • Participation in incident response, conducting root cause analysis and post-incident reviews is expected.
  • Automating manual tasks to improve operational efficiency and reduce technical debt is a key responsibility.

Requirements:

  • A Bachelor’s degree in Computer Science, Information Technology, or a related field is required.
  • Equivalent work experience in SRE, DevOps, or infrastructure management may substitute for formal education.
  • Experience managing and securing self-hosted GitLab environments is necessary.
  • Expertise in designing and maintaining automated pipelines for continuous delivery is required.
  • Strong knowledge of AWS services, including ECS, S3, API Gateway, DynamoDB, RDS, IAM, VPC, and Lambda, is essential.
  • Proficiency in Terraform, Ansible, or similar tools for Infrastructure-as-Code is required.
  • Experience with Prometheus, Grafana, Datadog, or other observability platforms is necessary.
  • Proficiency in Python, Bash, or other scripting languages to automate tasks is required.
  • The ability to lead incident response efforts and conduct root cause analysis is essential.
  • Strong interpersonal skills to work effectively across teams and with stakeholders are required.

Benefits:

  • KnowBe4 has been recognized as a best place to work for women, millennials, and in technology for four consecutive years.
  • The company has been certified as a "Great Place To Work" in 8 countries.
  • Employees enjoy a welcoming workplace that encourages them to be themselves.
  • The company promotes continuous professional development and radical transparency.
  • There are opportunities for team engagement through activities like team lunches, trivia competitions, and local outings.
Apply now
Please, let KnowBe4 know you found this job on RemoteYeah . This helps us grow 🌱.
About the job
Posted on
Job type
Salary
-
Location requirements

-

Report this job

Job expired or something else is wrong with this job?

Report this job
Leave a feedback