Please, let Kumo know you found this job
on RemoteYeah.
This helps us grow 🌱.
Description:
The Cloud Infrastructure team at Kumo manages the Kubernetes-based, cloud-native Kumo AI platform.
They define service level objectives, ensure capacity, maintain cost visibility, and uphold security compliance for the Multi-Cloud Platform.
As a key team member, you will architect scalable systems for the Kumo platform, making it the top choice for Big Data and AI workloads.
You will design the platform to handle large datasets, enhancing productivity for engineers and users.
Collaborating with ML scientists, product engineers, and leaders, you will influence scaling ML tech, develop tools for speed, and craft full-stack experiences.
Engineers at Kumo wear many hats, leading the design of core systems from scratch and shaping product direction.
You will manage foundational work, including model lifecycles, ML Ops, CI/CD, and deployment strategies.
You will build and extend components of the core Kumo Cloud Infrastructure and Kumo infrastructure.
You will define a culture of engineering excellence and operational efficiency, especially as it relates to development and productization.
You will build and automate CI-CD pipelines, release tooling to support continuous delivery, and true zero-downtime deployments across different cloud providers using the latest cloud-native technologies.
You will work on advanced tools developed for the world’s leading cloud-native machine learning engine that uses graph deep learning technology.
You will develop the infrastructure microservices for features such as usage tracking, diagnostics, monitoring, and alerting at the cloud scale.
You will lead automation efforts to streamline global deployment efforts.
You will build the Kumo ML Ops platform, which will be able to data drift, track model versions, report on production model performance, alert the team of any anomalous model behavior, and run programmatic A/B tests on production models.
Requirements:
A BS/MS in Computer Science or a related field is required; a PhD is preferred.
You must have 5-7+ years of experience managing Kubernetes (e.g., EKS, GKE, AKS, or OpenSource) in large-scale production environments, with deep knowledge of Kubernetes internals, controllers, operators, networking, and connectivity.
You should have 5-7+ years of experience building cloud-native infrastructure across AWS, Azure, and GCP.
You need 5-7+ years of experience developing platform engineering services using tools like Traefik, Istio/Envoy, and Calico/Tigera.
You must have 5-7+ years of experience writing production code in Python, Go, Rust, or similar languages.
Hands-on experience with Infrastructure-as-Code (IaC) tools such as Terraform, CloudFormation, Ansible, Chef, and Bash scripting is required.
Experience in architecting large-scale distributed systems for B2B SaaS applications is necessary.
A strong background in productionizing cloud applications, including Docker and Kubernetes, is essential.
You should have experience with CI/CD pipelines, advanced packaging, versioning, deployment orchestration, and infrastructure provisioning strategies.
Benefits:
Employees will receive stock options.
Competitive salaries are offered.
Medical insurance is provided.
Dental insurance is included.
Apply now
Please, let Kumo know you found this job
on RemoteYeah
.
This helps us grow 🌱.