Please let Vultr know you found this job on RemoteYeah. This helps us get more companies to post jobs here for you.
Description:
Vultr is seeking an AI Cluster Architect responsible for creating and refining large-scale GPU cluster architectures within strict power and infrastructure limits.
The role focuses on power-aware design, determining the optimal number of GPUs while considering compute nodes, storage systems, networking fabric, cooling, and facility constraints.
The architect must have deep experience in heterogeneous environments, multiple generations of hardware, and end user requirements.
Understanding the interaction of different GPU SKUs, NICs, switches, and fabrics at scale, including their power and thermal characteristics, is essential.
Responsibilities include architecting large-scale GPU clusters, modeling power consumption, evaluating networking architectures, and developing power-aware cluster configuration templates.
The architect will document architecture, design choices, and provide guidance on future-proofing for next-gen technologies.
Collaboration with vendors on novel fabric architectures for large-scale deployments is also a key responsibility.
Requirements:
Candidates must have 7+ years of experience designing or building large-scale HPC, AI, or hyperscale GPU clusters.
An expert understanding of GPU and accelerator system design, including node topology and PCIe/NVLink/NVSwitch/ROCm, is required.
Strong familiarity with InfiniBand, RoCE, and SpectrumX networking, including multi-tier and large-radix switch design, is necessary.
Demonstrated experience in modeling power draw and thermal characteristics of various systems is essential.
The ability to design networks that maintain full non-blocking performance or manage over/under-subscription impacts is required.
Proven skills in gathering and analyzing vendor SKU-level specifications for scalable cluster architectures are needed.
Experience in balancing customer-driven requirements for compute, storage, and service density is important.
Strong documentation, communication, and cross-functional collaboration skills are essential for this role.
Benefits:
Vultr offers excellent medical benefits with 100% company-paid premiums for employee-only plans, as well as 100% company-paid dental and vision premiums.
A 401(k) plan is available that matches 100% up to 4% with immediate vesting.
Professional development reimbursement of $2,500 each year is provided.
Employees enjoy 11 holidays, paid time off accrual, a rollover plan, and the option to take their birthday off.
Increased PTO is offered at 3-year and 10-year anniversaries, along with a 1-month paid sabbatical every 5 years and an anniversary bonus each year.
A $500 first-year remote office setup and $400 each following year for new equipment is included.
Internet reimbursement up to $75 per month and gym membership reimbursement up to $50 per month are also provided.
A company-paid Wellable subscription is part of the benefits package.