Remote Staff Site Reliability Engineer

Posted

Apply now
Please, let Crisis Text Line, Inc. know you found this job on RemoteYeah. This helps us grow 🌱.

Description:

  • The Staff Site Reliability Engineer (SRE) is a remote position that reports to the Senior Engineering Manager of SRE/Infrastructure.
  • This role is a key technical leader responsible for ensuring the reliability, scalability, and security of the platform.
  • The SRE will architect, build, and maintain tooling that empowers software engineering teams and manage the infrastructure supporting the Crisis Text Line service.
  • Collaboration with developers is essential to drive performance optimization, implement best practices, and ensure a secure environment.
  • The position focuses on enhancing engineer productivity through automation and streamlined workflows.
  • Key responsibilities include leading and mentoring a team of 5 SREs, enforcing security best practices, designing and maintaining AWS infrastructure, optimizing application performance, developing monitoring systems, automating tasks, responding to incidents, and conducting security audits.
  • The SRE will also communicate expectations and progress clearly, provide mentorship, write high-quality code, manage time effectively, and participate in retrospectives to improve processes.

Requirements:

  • A Bachelor's degree in Computer Science, Engineering, or a related field is required; a Master's degree is preferred.
  • Proven experience as a Staff SRE or in a similar role, with a strong focus on infrastructure and DevOps in a software delivery capacity is necessary.
  • Experience maintaining the reliability of online SaaS/PaaS with a 24/7 schedule is required.
  • Proficiency in AWS and infrastructure as code tools such as Terraform or CloudFormation is essential.
  • Strong scripting and automation skills, particularly in Python, along with knowledge of containerization and orchestration tools like Docker and Kubernetes are required.
  • Experience implementing CI/CD pipelines and observability tools, such as GitHub Actions and Datadog, is necessary.
  • A commitment to ethical practices, data privacy, and security is essential.
  • Solid understanding of network protocols, security principles, and best practices is required.
  • Excellent problem-solving skills and the ability to work under pressure, along with strong communication skills for effective collaboration with cross-functional teams, are necessary.
  • The ability to learn quickly and manage time effectively by focusing on priorities and meeting deadlines is required.
  • An understanding of essential computer science principles, including basic data structures and control structures, is necessary.

Benefits:

  • Crisis Text Line offers 20 paid holidays, including federal holidays, election day, a holiday break from December 24 through January 1, 2 renewal days, and 2 floating holidays.
  • Employees receive flexible paid time off, which includes 15 vacation days, 3 personal days, and 7 sick days.
  • Medical, dental, and vision benefits are provided for the staff member and their family at no cost to the employee.
  • A 403B retirement plan with a 3% contribution by Crisis Text Line is available to support financial wellness.
  • Employees are entitled to 12 weeks of paid parental leave after 6 months of employment.
  • A student loan repayment program is available after 2 years of continuous full-time service.
  • Family support is offered through a virtual childcare platform.
  • Stipends and allowances include monthly mental health support, internet service, annual professional development, wellness, and a one-time home office setup allowance in the first year.
  • Note that benefits are only for US-based employees, and international benefits may differ.
Apply now
Please, let Crisis Text Line, Inc. know you found this job on RemoteYeah . This helps us grow 🌱.
Report this job

Job expired or something else is wrong with this job?

Report this job
Leave a feedback