This job post is closed and the position is probably filled. Please do not apply.
🤖 Automatically closed by a robot after apply link
was detected as broken.
Description:
Vultr is seeking an AI/ML Infrastructure Engineer to build and support their bare metal and GPU-based product offerings.
The role involves developing and maintaining infrastructure in bare metal and containerized environments.
The engineer will work directly with the networking team to build scalable and supportable GPU clusters.
Responsibilities include ensuring excellent customer experience through consistent and reliable provisioning of GPU infrastructure.
The engineer will build and maintain test automation of GPU-based products for fast and reliable provisioning.
The position requires implementing and maintaining GPU-based solutions to meet diverse applications and computational workloads.
The engineer will conduct in-depth benchmarking, performance testing, and troubleshooting of GPU systems to identify and resolve hardware or software limitations.
The role includes working with vendors to obtain supported drivers and packages, and addressing any hardware, software, or performance issues promptly.
Requirements:
Candidates must have hands-on experience working with current, high-performance GPUs, primarily NVIDIA products.
In-depth, hands-on experience with automating bare metal internals including BIOS, BMC, firmware, NICs, Redfish/IPMI, and PCIe is required.
Experience with rail optimization across multiple clusters and architectures is necessary.
Proficiency in Linux, package management, and device drivers is essential.
Candidates should have experience with commercial firmware.
Knowledge of programming languages such as Python, Bash, and PHP is required.
Experience with Machine Learning software is also necessary.
Benefits:
Vultr offers a 100% remote work environment along with a company-wide virtual get-together.
Employees can participate in a 401(k) plan that matches 100% up to 4% with immediate vesting.
There is a Professional Development Reimbursement of $2,500 each year.
Employees receive 11 holidays, paid time off accrual, a rollover plan, and the option to take off their birthday.
Increased PTO is provided at the 3-year anniversary, along with a 1-month sabbatical at the 5-year anniversary and an anniversary bonus each year.
A $500 first-year remote office setup allowance and $400 each year following for new equipment is included.
Monthly internet reimbursement of up to $75 is provided.
Employees receive $50 per month for a gym membership.