Zyte is seeking an experienced Team Lead to manage the Core & MLOps Squad, responsible for building the infrastructure that powers Zyte at scale.
This is a hands-on technical leadership role that requires expertise in MLOps, systems programming, and orchestration.
The role involves designing and evolving the core platform, including Kubernetes, Mesos, GPU scheduling/autoscaling, and distributed compute.
The Team Lead will own the model platform, which includes registry, experiment tracking, training orchestration, evaluation, serving, and monitoring.
Responsibilities include building the Golden Path, which consists of reference repos, a scaffold CLI, opinionated CI/CD pipelines, and production-ready defaults.
The Team Lead will operate a secure, multi-tenant model registry and training platform with standardized experiment/evaluation harnesses.
The role includes providing turnkey serving patterns, drift/quality monitoring, and rollback playbooks.
The Team Lead will integrate public/open-source AI capabilities as managed platform services with cost and data-governance guardrails.
The position requires running the squad, including roadmap/prioritization, delivery, mentoring, and maintaining high engineering standards.
The Team Lead will partner with product engineering, Prod Ops, and Security on adoption and rollout plans, while mentoring the team and fostering a platform-thinking mindset.
Ownership areas include container orchestration, GPU provisioning & autoscaling, environment & secret management, observability, billing pipeline, and reliability enablement.
Requirements:
A minimum of 5 years of experience building distributed systems and at least 3 years in MLOps/ML platform engineering or equivalent impact is required.
Knowledge of Linux/OS internals, networking, concurrency, and performance profiling is essential.
A deep understanding of Kubernetes is required, with bonus knowledge of Mesos.
Proficiency in developing high-performance services in Java, Rust, Go, or C++, along with strong Python skills is necessary.
Experience with GPU infrastructure, including scheduling, containerization, and optimization, is required.
A track record of designing and operating model platforms in production is essential.
Demonstrated success in leading technical teams and implementing organization-wide platform solutions is required.
Benefits:
Zyte fosters and nourishes new ideas and brings them to market.
Employees become part of a self-motivated, progressive, multi-cultural team.
The company offers the freedom and flexibility to work remotely from anywhere.
Employees have the opportunity to work with cutting-edge open-source technologies and tools.