Please, let IO Global know you found this job
on RemoteYeah.
This helps us grow 🌱.
Description:
IOHK is a technology company focused on Blockchain research and development, emphasizing a scientific approach to ensure security, scalability, and sustainability.
The Site Reliability Engineer (SRE) role is integral to the open-source project, ensuring the reliability, availability, and performance of production systems.
The role combines service operation, systems engineering, and software engineering principles to operate and monitor services.
Responsibilities include designing, writing, and delivering tools and software using Python, Bash, Terraform, or Nix to improve service availability, scalability, and efficiency.
The SRE will engage in the entire lifecycle of services, from inception and design to deployment, operation, and continuous improvement.
The role involves practicing sustainable incident response and promoting blameless postmortems.
Collaboration with development teams is essential to ensure solutions are designed with customer experience, scalability, and performance in mind.
The SRE will analyze system performance and reliability, offering recommendations for enhancement.
Development and maintenance of service-level objectives (SLOs), service-level indicators (SLIs), and error budgets for services are required.
Participation in on-call rotations to respond to and mitigate service interruptions and technical challenges is expected.
Requirements:
Proficiency in Python, Bash, Terraform, and Nix for DevOps services is required.
Extensive experience with AWS, specifically with services like EKS and RDS, is necessary.
Familiarity with container orchestration, particularly Kubernetes, is essential.
Hands-on experience with PostgreSQL and its deployment on RDS is required.
Knowledge of monitoring tools such as Prometheus, Grafana, and Loki is necessary.
Solid troubleshooting and performance tuning capabilities are essential.
Exceptional communication skills and a strong team collaboration ethic are required.
Experience with CI/CD tools like Github Actions, Hydra, or Earthly is necessary.
Strong analytical and troubleshooting skills are essential for the role.
Excellent communication skills are needed to collaborate with development teams, operations, and other stakeholders.
The ability to quickly learn new technologies and adapt to changing environments is required.
High attention to detail is necessary to ensure system reliability and performance.
Benefits:
The position offers remote work flexibility.
There is a laptop reimbursement program available.
New starters receive a package to buy hardware essentials such as headphones and monitors.
Learning and development opportunities are provided to enhance skills.
Competitive paid time off (PTO) is included as part of the benefits.
Apply now
Please, let IO Global know you found this job
on RemoteYeah
.
This helps us grow 🌱.