IOG is a technology company focused on Blockchain research and development, emphasizing peer-reviewed research and formal methods for security, scalability, and sustainability.
The company aims to advance the capabilities and adoption of blockchain technology globally through projects in decentralized finance (DeFi), governance, and identity management.
As a Senior Site Reliability Engineer (SRE), you will shape the reliability and performance of systems across cloud infrastructure.
You will design and implement solutions to improve service reliability, automate routine tasks, and facilitate collaboration between development and operations teams.
Responsibilities include designing, building, and maintaining scalable systems on AWS, managing Kubernetes clusters, automating deployments using GitOps principles, and implementing CI/CD pipelines.
You will also develop automation tools, implement monitoring solutions, participate in on-call rotations, and lead incident response efforts.
The role requires effective communication of technical solutions and incident retrospectives to both technical and non-technical stakeholders.
You will evaluate and adopt new technologies, document processes, and strive for continuous improvement in delivery and standards.
Requirements:
You must have 7+ years of experience in SRE, DevOps, or a related role.
A strong understanding of SRE best practices, architectures, and methods is required.
Good knowledge of resiliency patterns and cloud security is essential.
Strong programming proficiency in Python, Golang, or Javascript is necessary, with Rust experience being advantageous.
Demonstrated experience with AWS and modern cloud architectures is required.
Proficiency in Helm, Terraform, and CI/CD tools like Github Actions and ArgoCD is necessary.
Hands-on experience with Kubernetes/EKS and GitOps methodologies is required.
Proven track record with monitoring tools such as Prometheus and OpenTelemetry is essential.
Blockchain experience is advantageous, providing a unique perspective on distributed systems and security.
Exceptional problem-solving skills and the ability to translate vague requirements into clear plans are necessary.
You should be able to engage in technical discussions and participate in decision-making processes.
Experience working within an Agile environment and with a distributed team is required.
Strong communication and collaboration abilities are essential for working across different teams.
A proactive and innovative mindset with a passion for continuous improvement and operational excellence is necessary.
Benefits:
The position offers remote work flexibility.
Laptop reimbursement is provided to support your work setup.
A new starter package is available to buy hardware essentials such as headphones and monitors.
Learning and development opportunities are offered to enhance your skills.
Competitive paid time off (PTO) is provided to ensure work-life balance.