Remote AI/ML Infrastructure Engineer

Posted

This job is closed

This job post is closed and the position is probably filled. Please do not apply.  Automatically closed by a robot after apply link was detected as broken.

Description:

  • Vultr is seeking an AI/ML Infrastructure Engineer to build and support their bare metal and GPU-based product offerings.
  • The role involves developing and maintaining infrastructure in bare metal and containerized environments.
  • The engineer will work directly with the networking team to build scalable and supportable GPU clusters.
  • Ensuring an excellent customer experience by providing consistent and reliable provisioning of GPU infrastructure is a key responsibility.
  • The position includes building and maintaining test automation of GPU-based products for fast and reliable provisioning.
  • The engineer will implement and maintain GPU-based solutions to meet diverse applications and computational workloads.
  • Conducting in-depth benchmarking, performance testing, and troubleshooting of GPU systems to identify and resolve hardware or software limitations is required.
  • The role involves working with vendors to obtain supported drivers and packages, as well as addressing any bugs, performance-related issues, and hardware problems.

Requirements:

  • Candidates must have hands-on experience working with current, high-performance GPUs, primarily NVIDIA products.
  • In-depth, hands-on experience with automating bare metal internals including BIOS, BMC, firmware, NICs, Redfish/IPMI, and PCIe is required.
  • Experience with rail optimization across multiple clusters and architectures is necessary.
  • Proficiency in Linux, package management, and device drivers is essential.
  • Candidates should have experience with commercial firmware.
  • Proficiency in programming languages such as Python, Bash, and PHP is required.
  • Experience with Machine Learning software is also necessary.

Benefits:

  • Vultr offers a 100% remote work environment along with a company-wide virtual get-together.
  • Employees can participate in a 401(k) plan that matches 100% up to 4% with immediate vesting.
  • There is a Professional Development Reimbursement of $2,500 each year.
  • Employees receive 11 holidays, paid time off accrual, a rollover plan, and the option to take off their birthday.
  • Increased PTO is provided at the 3-year anniversary, along with a 1-month sabbatical at the 5-year anniversary and an anniversary bonus each year.
  • A $500 first-year remote office setup allowance and $400 each year following for new equipment is provided.
  • Monthly internet reimbursement of up to $75 is available.
  • Employees receive $50 per month for a gym membership.
About the job
Posted on
Job type
Salary
$ 120,000 - 150,000 USD / year
Location requirements

-

Experience level
Technology stack
Leave a feedback