Remote Senior MLOPs Engineer (Canada) at Rackspace

Description:

We are looking for a seasoned Senior ML OPS Engineer to architect, build, and optimize an ML inference platform.
The role requires significant expertise in Machine Learning engineering and infrastructure, focusing on building Machine Learning inference systems.
Proven experience in building and scaling ML inference platforms in a production environment is crucial.
This remote position requires exceptional communication skills and the ability to independently tackle complex challenges with innovative solutions.
Responsibilities include architecting and optimizing existing data infrastructure to support cutting-edge machine learning and deep learning models.
The engineer will collaborate closely with cross-functional teams to translate business objectives into robust engineering solutions.
The role involves owning the end-to-end development and operation of high-performance, cost-effective inference systems for a diverse range of models, including state-of-the-art LLMs.
The engineer will provide technical leadership and mentorship to foster a high-performing engineering team.

A proven track record in designing and implementing cost-effective and scalable ML inference systems is required.
Hands-on experience with leading deep learning frameworks such as TensorFlow, Keras, or Spark MLlib is necessary.
A solid foundation in machine learning algorithms, natural language processing, and statistical modeling is essential.
A strong grasp of fundamental computer science concepts including algorithms, distributed systems, data structures, and database management is required.
Proficiency and recent experience in Java is mandatory.
The ability to tackle complex challenges and devise effective solutions using critical thinking is essential.
Experience working effectively in a remote setting while maintaining strong written and verbal communication skills is required.
Proven experience in the Apache Hadoop ecosystem (Oozie, Pig, Hive, Map Reduce) is necessary.
Expertise in public cloud services, particularly in GCP and Vertex AI, is required.
Proven expertise in applying model optimization techniques (distillation, quantization, hardware acceleration) to production environments is a must.
An in-depth understanding of LLM architectures, parameter scaling, and deployment trade-offs is required.
A technical degree: Bachelor's degree in Computer Science with a minimum of 8 years of relevant industry experience, or a Master's degree in Computer Science with at least 6 years of relevant industry experience is necessary.
A specialization in Machine Learning is preferred.

Rackspace Technology is recognized as a best place to work, year after year, according to Fortune, Forbes, and Glassdoor.
The company attracts and develops world-class talent, providing opportunities for professional growth.
Employees are encouraged to bring their whole selves to work and embrace unique perspectives that fuel innovation.
Rackspace Technology is committed to offering equal employment opportunities without regard to various legally protected characteristics.
The company is dedicated to accommodating individuals with disabilities or special needs.