Remote GPU Cloud Platform Engineer

at Yotta Labs

Posted 1 day ago 2 applied

Description:

  • Yotta Labs is seeking a GPU Cloud Platform Engineer to join their core infrastructure team.
  • The role involves designing, deploying, and operating large-scale, multi-cluster GPU infrastructure across data centers and cloud environments.
  • Responsibilities include ensuring high availability, performance, and efficiency of containerized AI workloads deployed in Kubernetes-based GPU clusters.
  • The engineer will build and operate large-scale, high-performance GPU clusters, monitor and troubleshoot online issues, and conduct performance testing of multi-node GPU clusters.
  • The role requires deploying and orchestrating large models across multi-cluster environments using Kubernetes, implementing elastic scaling and cross-cluster load balancing.
  • The engineer will participate in the design and development of GPU cluster scheduling and optimization systems, defining Kubernetes multi-cluster configuration standards.
  • Responsibilities also include building a unified multi-cluster management and monitoring system and coordinating with IDC providers for planning and deploying GPU clusters.

Requirements:

  • A Bachelor's degree or higher in Computer Science, Software Engineering, Electronic Engineering, or related fields is required, along with 3+ years of experience in system engineering or DevOps.
  • Candidates must have 5+ years of experience in cloud-native development or AI engineering, with at least 2 years of hands-on experience in Kubernetes multi-cluster management and orchestration.
  • Familiarity with the Kubernetes ecosystem and hands-on experience with tools such as kubectl and Helm is essential.
  • Proficiency in Docker and containerization technologies, as well as experience with monitoring tools like Prometheus and Grafana, is required.
  • Candidates should have hands-on experience with cloud platforms such as AWS, GCP, or Azure and an understanding of cloud-native multi-cluster architecture.
  • Experience with cluster management tools and familiarity with distributed file systems is a plus.
  • Strong communication skills, self-motivation, and the ability to collaborate in a team are necessary.

Benefits:

  • Joining Yotta Labs offers the opportunity to be part of a visionary team aiming to redefine AI infrastructure.
  • Employees will work on cutting-edge technologies that bridge AI and decentralized computing.
  • The role provides the chance to collaborate with experts from leading institutions and tech companies.
  • Yotta Labs offers a flexible, remote work environment that values innovation and autonomy.