Please, let Rackspace know you found this job
on RemoteYeah.
This helps us grow π±.
Description:
We are looking for a seasoned Principal ML OPS Engineer to architect, build, and optimize ML inference platforms.
The role requires significant expertise in Machine Learning engineering and infrastructure, focusing on building Machine Learning inference systems.
Proven experience in building and scaling ML inference platforms in a production environment is crucial.
This remote position demands exceptional communication skills and the ability to independently tackle complex challenges with innovative solutions.
Responsibilities include architecting and optimizing existing data infrastructure to support cutting-edge machine learning and deep learning models.
The engineer will collaborate closely with cross-functional teams to translate business objectives into robust engineering solutions.
The role involves owning the end-to-end development and operation of high-performance, cost-effective inference systems for a diverse range of models, including state-of-the-art LLMs.
The engineer will provide technical leadership and mentorship to foster a high-performing engineering team.
Requirements:
A proven track record in designing and implementing cost-effective and scalable ML inference systems is required.
Hands-on experience with leading deep learning frameworks such as TensorFlow, Keras, or Spark MLlib is necessary.
A solid foundation in machine learning algorithms, natural language processing, and statistical modeling is essential.
A strong grasp of fundamental computer science concepts including algorithms, distributed systems, data structures, and database management is needed.
The ability to tackle complex challenges and devise effective solutions is crucial, with a focus on critical thinking to approach problems from various angles.
Experience working effectively in a remote setting while maintaining strong written and verbal communication skills is required.
Proven experience in the Apache Hadoop ecosystem (Oozie, Pig, Hive, Map Reduce) is necessary.
Expertise in public cloud services, particularly in GCP and Vertex AI, is required.
Proven expertise in applying model optimization techniques (distillation, quantization, hardware acceleration) to production environments is a must-have.
Proficiency and recent experience in Java is required.
An in-depth understanding of LLM architectures, parameter scaling, and deployment trade-offs is necessary.
A technical degree: Bachelor's degree in Computer Science with a minimum of 10+ years of relevant industry experience, or a Master's degree in Computer Science with at least 8+ years of relevant industry experience is required.
A specialization in Machine Learning is preferred.
Benefits:
The anticipated starting pay range for Colorado is $204,000 - $255,000.
The anticipated starting pay range for Hawaii and New York (not including NYC) is $191,600 - $239,500.
The anticipated starting pay range for California, New York City, and Washington is $223,200 - $279,000.
The role may include variable compensation in the form of bonuses, commissions, or other discretionary payments based on company and/or individual performance.
Actual compensation is influenced by various factors including skill set, level of experience, licenses and certifications, and specific work location.
Rackspace Technology offers a range of benefits, which can be found on their benefits page.
Apply now
Please, let Rackspace know you found this job
on RemoteYeah
.
This helps us grow π±.