Remote Principal MLOPs Engineer

Posted

Apply now
Please, let Rackspace know you found this job on RemoteYeah. This helps us grow 🌱.

Description:

  • We are looking for a seasoned Principal ML OPS Engineer to architect, build, and optimize ML inference platforms.
  • The role requires significant expertise in Machine Learning engineering and infrastructure, focusing on building Machine Learning inference systems.
  • Proven experience in building and scaling ML inference platforms in a production environment is crucial.
  • This remote position demands exceptional communication skills and the ability to independently tackle complex challenges with innovative solutions.
  • Responsibilities include architecting and optimizing existing data infrastructure to support cutting-edge machine learning and deep learning models.
  • The engineer will collaborate closely with cross-functional teams to translate business objectives into robust engineering solutions.
  • The role involves owning the end-to-end development and operation of high-performance, cost-effective inference systems for a diverse range of models, including state-of-the-art LLMs.
  • The engineer will provide technical leadership and mentorship to foster a high-performing engineering team.

Requirements:

  • A proven track record in designing and implementing cost-effective and scalable ML inference systems is required.
  • Hands-on experience with leading deep learning frameworks such as TensorFlow, Keras, or Spark MLlib is necessary.
  • A solid foundation in machine learning algorithms, natural language processing, and statistical modeling is essential.
  • A strong grasp of fundamental computer science concepts including algorithms, distributed systems, data structures, and database management is needed.
  • The ability to tackle complex challenges and devise effective solutions is crucial, with a focus on critical thinking to approach problems from various angles.
  • Experience working effectively in a remote setting while maintaining strong written and verbal communication skills is required.
  • Proven experience in the Apache Hadoop ecosystem (Oozie, Pig, Hive, Map Reduce) is necessary.
  • Expertise in public cloud services, particularly in GCP and Vertex AI, is required.
  • Proven expertise in applying model optimization techniques (distillation, quantization, hardware acceleration) to production environments is a must-have.
  • Proficiency and recent experience in Java is required.
  • An in-depth understanding of LLM architectures, parameter scaling, and deployment trade-offs is necessary.
  • A technical degree: Bachelor's degree in Computer Science with a minimum of 10+ years of relevant industry experience, or a Master's degree in Computer Science with at least 8+ years of relevant industry experience is required.
  • A specialization in Machine Learning is preferred.

Benefits:

  • The anticipated starting pay range for Colorado is $204,000 - $255,000.
  • The anticipated starting pay range for Hawaii and New York (not including NYC) is $191,600 - $239,500.
  • The anticipated starting pay range for California, New York City, and Washington is $223,200 - $279,000.
  • The role may include variable compensation in the form of bonuses, commissions, or other discretionary payments based on company and/or individual performance.
  • Actual compensation is influenced by various factors including skill set, level of experience, licenses and certifications, and specific work location.
  • Rackspace Technology offers a range of benefits, which can be found on their benefits page.
Apply now
Please, let Rackspace know you found this job on RemoteYeah . This helps us grow 🌱.
About the job
Posted on
Job type
Salary
$ 204,000 - 279,000 USD / year
Report this job

Job expired or something else is wrong with this job?

Report this job
Leave a feedback