Please let Stack AV know you found this job on RemoteYeah. This helps us get more companies to post jobs here for you.
Description:
Stack is developing AI and autonomous systems for the trucking industry, focusing on safety, reliability, and efficiency.
Site Reliability Engineers (SRE) ensure production systems meet service-level objectives through observability and automation.
The role involves maintaining the compute platform for large-scale autonomous systems development and supporting engineers in running compute and data-intensive workloads.
Requirements:
Fundamental understanding of Linux internals, TCP/IP networking, and storage subsystems.
Strong experience with Kubernetes and container orchestration in production environments.
Ability to guide teams on scaling services within budget constraints.
Experience with cloud-native and open-source tools like Kubernetes, etcd, Prometheus, and OpenTelemetry.
Strong communication skills and ability to work in a diverse, distributed team.
Benefits:
Opportunity to work in a diverse and inclusive environment.
Contribute to innovative solutions in the autonomous technology sector.