Remote SRE (Site Reliability Engineer)

Posted

Apply now
Please, let Solvd know you found this job on RemoteYeah. This helps us grow 🌱.

Description:

  • Solvd Inc. is a premier software engineering company with 8 offices globally and over 800 international employees.
  • The company has over 12 years of experience and helps clients create software that improves operations and opens new markets.
  • Solvd Inc. serves a roster of digital-native enterprise clients, including major brands in retail and social media.
  • The company is seeking a Site Reliability Engineer to join their growing team.
  • Responsibilities include collaborating with product, engineering, and operations teams to enhance the reliability, scalability, and performance of infrastructure and services.
  • The role involves overseeing the end-to-end management of production systems to ensure high availability and rapid recovery from failures.
  • The engineer will develop and maintain SRE best practices through automation, monitoring, and alerting to minimize system downtime.
  • Responsibilities also include creating and managing infrastructure-as-code (IaC) layers, scripts, deployment frameworks, and tools for efficient environments.
  • The engineer will work closely with the software engineering team to design and implement monitoring and alerting systems.
  • Incident response and root cause analysis for critical issues will be part of the role, focusing on eliminating causes of outages or poor performance.
  • The engineer will be responsible for the performance and scalability of AWS environments, ensuring they meet service level objectives (SLOs).
  • Providing expertise during client meetings to address reliability and scalability questions is also expected.
  • The role includes maintaining comprehensive documentation on system architecture, processes, and runbooks.
  • Engaging in capacity planning, disaster recovery exercises, and postmortem reviews for continual improvement in system resilience is required.

Requirements:

  • A Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience is required.
  • At least 5 years of professional experience in a Site Reliability Engineering (SRE), DevOps, or similar role is necessary.
  • Strong expertise in Amazon Web Services (AWS) is required, with AWS certifications being a plus.
  • Proficiency in infrastructure-as-code (IaC) tools like CloudFormation or Terraform is essential.
  • The candidate must be skilled in at least one programming language such as Python, Java, or Go, with experience in scripting for automation and systems management.
  • Expertise in automating cloud-native technologies and provisioning infrastructure across large environments is required.
  • Proven experience in building CI/CD pipelines and automating deployment processes with tools like Jenkins, GitLab, or AWS CodePipeline is necessary.
  • Hands-on experience with containerization technologies, such as Docker and Kubernetes, is required for managing microservices-based architectures.
  • A deep understanding of Linux systems, networking, and security best practices is essential.
  • The candidate must demonstrate the ability to work with monitoring tools (e.g., Prometheus, Grafana) and troubleshoot live systems.
  • Excellent communication skills are required, with the ability to collaborate with cross-functional teams and explain complex concepts to clients and non-technical stakeholders.

Benefits:

  • Working with a premier software engineering company that has a global presence and a diverse team.
  • Opportunities to collaborate with top brands in retail and social media.
  • The chance to enhance skills in Site Reliability Engineering and cloud technologies.
  • A dynamic work environment that encourages continual improvement and innovation.
  • The opportunity to engage in capacity planning and disaster recovery exercises.
  • A role that offers the chance to work with cutting-edge technologies and methodologies in software engineering.
Apply now
Please, let Solvd know you found this job on RemoteYeah . This helps us grow 🌱.
About the job
Posted on
Job type
Salary
-
Report this job

Job expired or something else is wrong with this job?

Report this job
Leave a feedback