Yotta Labs is seeking a GPU Cloud Platform Engineer to join their core infrastructure team.
The role involves designing, deploying, and operating large-scale, multi-cluster GPU infrastructure across data centers and cloud environments.
Responsibilities include ensuring high availability, performance, and efficiency of containerized AI workloads deployed in Kubernetes-based GPU clusters.
The engineer will build and operate large-scale, high-performance GPU clusters, monitor and troubleshoot online issues, and conduct performance testing of multi-node GPU clusters.
The role requires deploying and orchestrating large models across multi-cluster environments using Kubernetes, implementing elastic scaling and cross-cluster load balancing.
The engineer will participate in the design and development of GPU cluster scheduling and optimization systems, defining Kubernetes multi-cluster configuration standards.
Responsibilities also include building a unified multi-cluster management and monitoring system and coordinating with IDC providers for planning and deploying GPU clusters.
Requirements:
A Bachelor's degree or higher in Computer Science, Software Engineering, Electronic Engineering, or related fields is required, along with 3+ years of experience in system engineering or DevOps.
Candidates must have 5+ years of experience in cloud-native development or AI engineering, with at least 2 years of hands-on experience in Kubernetes multi-cluster management and orchestration.
Familiarity with the Kubernetes ecosystem and hands-on experience with tools such as kubectl and Helm is essential.
Proficiency in Docker and containerization technologies, as well as experience with monitoring tools like Prometheus and Grafana, is required.
Candidates should have hands-on experience with cloud platforms such as AWS, GCP, or Azure and an understanding of cloud-native multi-cluster architecture.
Experience with cluster management tools and familiarity with distributed file systems is a plus.
Strong communication skills, self-motivation, and the ability to collaborate in a team are necessary.
Benefits:
Joining Yotta Labs offers the opportunity to be part of a visionary team aiming to redefine AI infrastructure.
Employees will work on cutting-edge technologies that bridge AI and decentralized computing.
The role provides the chance to collaborate with experts from leading institutions and tech companies.
Yotta Labs offers a flexible, remote work environment that values innovation and autonomy.