Remote Sr. Site Reliability Engineer

Posted

This job is closed

This job post is closed and the position is probably filled. Please do not apply.  Automatically closed by a robot after apply link was detected as broken.

Description:

  • We are seeking a Site Reliability Engineer to join our tech startup in the infrastructure and authorization space.
  • As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, availability, and performance of our systems.
  • You will be responsible for designing, implementing, and maintaining scalable infrastructure solutions to support our growing customer base.
  • This is an exciting opportunity to work in a fast-paced environment and contribute to the success of a company bringing a Google-inspired authorization system to companies around the globe.
  • Your responsibilities will include designing, implementing, and maintaining highly available and scalable infrastructure solutions for our projects, products, and customers.
  • You will monitor and analyze system performance, identifying and resolving bottlenecks and issues to ensure optimal performance and reliability.
  • You will automate infrastructure deployment and configuration management processes.
  • You will continuously improve system reliability, security, and efficiency through proactive monitoring, capacity planning, and performance tuning.
  • You will troubleshoot and resolve complex infrastructure and application issues in production and test environments.
  • You will collaborate with software engineering teams to design and implement systems that are resilient, scalable, and secure.
  • You will participate in on-call rotation and respond to production incidents in a timely manner.
  • You will document system configurations, troubleshooting procedures, and operational guidelines.

Requirements:

  • Proven experience as a Site Reliability Engineer or in a similar role is required.
  • A strong understanding of networking, operating systems, and cloud infrastructure is necessary.
  • Experience with Site Reliability Engineering, System Design, and Distributed Computing is essential.
  • You should have experience in various programming languages, including NodeJS, Java, Python, Ruby, and Go.
  • Experience with containerization technologies such as Docker and Kubernetes is required.
  • Knowledge of infrastructure-as-code tools like Terraform and Pulumi is necessary.
  • Familiarity with monitoring and logging tools, such as Prometheus, Grafana, and the ELK stack, is important.
  • Experience with lower-level implementation details of relational databases is preferred, with a bonus for experience with distributed SQL databases like Google Cloud Spanner or CockroachDB.
  • Experience working with Git and GitHub is required.
  • Experience with continuous integration and deployment systems is necessary.
  • Strong problem-solving and troubleshooting skills are essential.
  • Excellent communication and collaboration abilities are required.

Benefits:

  • This position offers the opportunity to work remotely from the U.S. or EU.
  • You will be part of a dynamic and innovative tech startup environment.
  • You will have the chance to contribute to the development of a cutting-edge authorization system.
  • The role provides opportunities for professional growth and development in the field of Site Reliability Engineering.
  • You will work with a talented team of professionals in a fast-paced and collaborative setting.
Leave a feedback