Please, let PostHog know you found this job
on RemoteYeah.
This helps us grow π±.
Description:
The Site Reliability Engineer position is a full-time remote role available for candidates in the US, UK, and EMEA time zones.
The ClickHouse Operations team is responsible for managing the ClickHouse cluster used by all teams and customers for data storage and querying.
The role involves automating and maintaining the infrastructure for ClickHouse instances and scaling the cluster on demand.
Responsibilities include automating the provisioning of metal resources on AWS, dynamic provisioning of instances using Terraform, Ansible, and Kubernetes, enhancing visibility into cluster status, and conducting performance investigations with the latest hardware.
Requirements:
Candidates must have proficiency in Python, Kubernetes, and AWS.
Experience in building and operating high-scale complex data storage solutions is required.
A strong interest and experience in ClickHouse or similar OLAP databases, particularly in query performance optimization, is essential.
The ability to thrive in a culture of autonomy and self-direction is necessary.
Experience with Terraform and Ansible for infrastructure automation is a nice-to-have.
Benefits:
The position offers generous and transparent compensation along with employee-friendly equity options.
Employees enjoy unlimited time off with a minimum of 25 days, with an average of 32 days taken in 2021.
Private medical insurance, including dental and vision, is provided for employees in the US and UK.
A pension plan with 401k contributions (4% matching) is available.
The company offers generous parental, bereavement, and child loss leave.
A training budget and free books are provided for professional development.