Remote Staff Software Engineer - Compute Reliability and Efficiency
Posted
This job is closed
This job post is closed and the position is probably filled. Please do not apply.
π€ Automatically closed by a robot after apply link
was detected as broken.
Description:
The Staff Software Engineer position at Reddit focuses on lower-level (Linux and Kubernetes) systems engineering within the Compute Reliability and Efficiency team.
The role involves working on intra-cluster engineering problems related to performance, efficiency, and stability.
Responsibilities include tasks such as detection of node-level performance characteristics, schedulers for resource packing, Kubernetes integrations, and cluster upgrades.
The position requires collaborating with a team to create and maintain the foundational platform for Reddit's infrastructure.
Daily tasks involve performance and reliability analysis on the Linux-based Kubernetes fleet, designing and delivering software to enhance Reddit's Compute Platform, and contributing to the technical and strategic direction of the platform.
Automation of critical development processes and sharing on-call responsibilities with the Compute team are also part of the role.
Requirements:
7+ years of experience in infrastructure domain with a focus on lower-level systems like Linux.
Proficiency in Go (Preferred), Rust, or Python programming languages.
Understanding of kernel primitives, CPU scheduling, userspace concerns, and packet processing.
Experience developing on Kubernetes or similar distributed systems.
Strong troubleshooting skills from higher-level orchestration to lower-level runtime concerns.
Experience in designing large systems, scoping work, and building consensus with other engineers.
Excellent communication skills to collaborate effectively with a service-oriented team and company.