Remote AI/ML Infrastructure Engineer

Posted

This job is closed

This job post is closed and the position is probably filled. Please do not apply.  Automatically closed by a robot after apply link was detected as broken.

Description:

  • Vultr is seeking an AI/ML Infrastructure Engineer to build and support their bare metal and GPU-based product offerings.
  • The role involves developing and maintaining infrastructure in bare metal and containerized environments.
  • The engineer will work directly with the networking team to build scalable and supportable GPU clusters.
  • Responsibilities include ensuring excellent customer experience through consistent and reliable provisioning of GPU infrastructure.
  • The engineer will build and maintain test automation of GPU-based products for fast and reliable provisioning.
  • The position requires implementing and maintaining GPU-based solutions to meet diverse applications and computational workloads.
  • The engineer will conduct in-depth benchmarking, performance testing, and troubleshooting of GPU systems to identify and resolve hardware or software limitations.
  • The role includes working with vendors to obtain supported drivers and packages, and addressing any hardware, software, or performance issues promptly.

Requirements:

  • Candidates must have hands-on experience working with current, high-performance GPUs, primarily NVIDIA products.
  • In-depth, hands-on experience with automating bare metal internals including BIOS, BMC, firmware, NICs, Redfish/IPMI, and PCIe is required.
  • Experience with rail optimization across multiple clusters and architectures is necessary.
  • Proficiency in Linux, package management, and device drivers is essential.
  • Candidates should have experience with commercial firmware.
  • Knowledge of programming languages such as Python, Bash, and PHP is required.
  • Experience with Machine Learning software is also necessary.

Benefits:

  • Vultr offers a 100% remote work environment along with a company-wide virtual get-together.
  • Employees can participate in a 401(k) plan that matches 100% up to 4% with immediate vesting.
  • There is a Professional Development Reimbursement of $2,500 each year.
  • Employees receive 11 holidays, paid time off accrual, a rollover plan, and the option to take off their birthday.
  • Increased PTO is provided at the 3-year anniversary, along with a 1-month sabbatical at the 5-year anniversary and an anniversary bonus each year.
  • A $500 first-year remote office setup allowance and $400 each year following for new equipment is included.
  • Monthly internet reimbursement of up to $75 is provided.
  • Employees receive $50 per month for a gym membership.
About the job
Posted on
Job type
Salary
$ 120,000 - 150,000 USD / year
Location requirements

-

Experience level
Technology stack
Leave a feedback