This job post is closed and the position is probably filled. Please do not apply.
🤖 Automatically closed by a robot after apply link
was detected as broken.
Description:
KOMOJU (by Degica) is the leading cross-border payment gateway for Japan, powering payments for companies like Steam and TikTok.
The Site Reliability Engineer (SRE) will focus on observability to maintain the reliability and availability of complex systems.
The role involves ensuring that infrastructure is understandable and measurable, detecting issues before they impact users, and improving system performance.
Responsibilities include designing and evolving the observability platform, defining and monitoring SLIs/SLOs, and collaborating with development teams.
The SRE will build and maintain dashboards and alerts, troubleshoot system performance issues, and educate engineering teams on best practices.
Requirements:
Candidates must have 3+ years of experience in SRE roles.
Hands-on experience with observability tools, preferably Datadog, is required.
Proficiency in Terraform is necessary.
A background in software development is essential.
Candidates should be proficient in at least one scripting or programming language such as Ruby/Rails, Python, Go, or Shell Script.
Experience working with AWS is required.
Familiarity with monitoring design principles like RED, USE, SLI/SLO, and alert tuning is necessary.
The ability to analyze logs, metrics, and traces to diagnose issues and identify trends is essential.
Benefits:
Degica embraces remote work while also providing office space for those who prefer in-person collaboration.
Employees receive 10 days of regular vacation, plus an additional 5 days for summer and 5 days for winter vacation.
A paid birthday holiday is included.
There is a budget for a self-learning allowance to ensure employees' skills remain current.