This job post is closed and the position is probably filled. Please do not apply.
π€ Automatically closed by a robot after apply link
was detected as broken.
Description:
Perform performance analysis using data from APM and distributed telemetry data tools to identify sources of instability
Analyze complex systems to minimize downtime and operational surprises
Engage in software engineering and patching to enhance performance, scalability, and reliability
Make infrastructure modifications in data center metal and public cloud environments
Conduct predictive failure analysis and disaster planning
Develop new tools and automation to streamline the DevOps pipeline
Collaborate with other engineering teams
Administer and configure databases and kv stores with a focus on uptime and performance
Handle incident response and postmortem reports
Requirements:
Hold a STEM degree and possess relevant experience as a Site Reliability Engineer
Demonstrate exceptional problem-solving skills
Have high proficiency in one of the following: C, C++, Java, Python, Go, etc.
Exhibit strong Unix/Linux environment skills with excellent knowledge of internals
Possess networking knowledge for metal and cloud environments
Experience in database administration and configuration
Familiarity with DevOps tools like Terraform, Ansible, Docker, Kubernetes
Willingness to be on call for monitoring and alerting of core website functions
Benefits:
Work with a strong team of A-players
Be part of a robust engineering culture
Opportunity to impact a highly popular product
Freedom to contribute ideas and make technical decisions
Receive support and guidance from a professional team
Enjoy a flexible working environment
Fully remote optional with a flexible work schedule
Health, Vision, Dental, and Life Insurance for you and dependents with premiums covered