This job post is closed and the position is probably filled. Please do not apply.
🤖 Automatically closed by a robot after apply link
was detected as broken.
Description:
Vultr is seeking an AI/ML Infrastructure Engineer to build and support their bare metal and GPU-based product offerings.
The role involves developing and maintaining infrastructure in bare metal and containerized environments.
The engineer will work directly with the networking team to build scalable and supportable GPU clusters.
Ensuring an excellent customer experience by providing consistent and reliable provisioning of GPU infrastructure is a key responsibility.
The position includes building and maintaining test automation of GPU-based products for fast and reliable provisioning.
The engineer will implement and maintain GPU-based solutions to meet diverse applications and computational workloads.
Conducting in-depth benchmarking, performance testing, and troubleshooting of GPU systems to identify and resolve hardware or software limitations is required.
The role involves working with vendors to obtain supported drivers and packages, as well as addressing any bugs, performance-related issues, and hardware problems.
Requirements:
Candidates must have hands-on experience working with current, high-performance GPUs, primarily NVIDIA products.
In-depth, hands-on experience with automating bare metal internals including BIOS, BMC, firmware, NICs, Redfish/IPMI, and PCIe is required.
Experience with rail optimization across multiple clusters and architectures is necessary.
Proficiency in Linux, package management, and device drivers is essential.
Candidates should have experience with commercial firmware.
Proficiency in programming languages such as Python, Bash, and PHP is required.
Experience with Machine Learning software is also necessary.
Benefits:
Vultr offers a 100% remote work environment along with a company-wide virtual get-together.
Employees can participate in a 401(k) plan that matches 100% up to 4% with immediate vesting.
There is a Professional Development Reimbursement of $2,500 each year.
Employees receive 11 holidays, paid time off accrual, a rollover plan, and the option to take off their birthday.
Increased PTO is provided at the 3-year anniversary, along with a 1-month sabbatical at the 5-year anniversary and an anniversary bonus each year.
A $500 first-year remote office setup allowance and $400 each year following for new equipment is provided.
Monthly internet reimbursement of up to $75 is available.
Employees receive $50 per month for a gym membership.