Remote Senior Machine Learning Engineer - Machine Learning Infrastructure

Posted

Apply now
Please, let Flip know you found this job on RemoteYeah. This helps us grow 🌱.

Description:

  • Flip.shop is seeking a Senior Machine Learning Engineer - Machine Learning Infrastructure to design, build, and optimize the infrastructure that powers their machine learning systems.
  • The role involves ensuring the efficient deployment, scaling, and monitoring of machine learning models, and streamlining the development lifecycle.
  • Responsibilities include designing and implementing scalable infrastructure for deploying, monitoring, and maintaining machine learning models in production environments, specifically for feeds, ads, and search ranking models.
  • The engineer will optimize the serving and training infrastructure of machine learning models and enhance workflows for model training, serving, data pipelines, storage systems, and resource management within multi-tenancy machine learning systems.
  • The position requires building tools to automate workflows for model training, testing, and deployment, ensuring quick transitions from development to production.
  • Performance optimization is crucial, focusing on minimizing latency and maximizing throughput for high-performance model inference at scale.
  • Collaboration with data scientists, machine learning engineers, and DevOps teams is essential to create seamless integration between development and production environments.
  • The engineer will build robust monitoring systems to track model performance and infrastructure health, ensuring reliability and uptime of machine learning services.
  • Implementing best practices in infrastructure security, data privacy, and compliance, particularly when handling sensitive user data, is also a key responsibility.

Requirements:

  • A Bachelor's degree or higher in Computer Science or a related field is required, along with 3+ years of experience in building scalable systems.
  • Proficiency in one or two programming languages (C/C++, Golang) within a Linux environment is necessary.
  • A solid understanding of GPU hardware architecture, GPU software stack (CUDA, cuDNN), and experience in GPU performance analysis is required.
  • Experience in deep model inference/training, debugging, and tuning is essential.
  • Familiarity with mainstream machine learning frameworks (e.g., TensorFlow, PyTorch, MxNet) is expected.
  • Knowledge of MLOps practices is required.
  • Experience with big data frameworks (e.g., Spark, Hadoop, Flink) and resource management and task scheduling for large-scale distributed systems is necessary.
  • Experience in using or designing open-source machine learning lifecycle management systems like TFX is preferred.
  • Excellent logical analysis and problem-solving skills, strong sense of responsibility, good learning ability, communication skills, and self-motivation are essential.
  • Good working document habits, with timely writing and updating of workflow and technical documentation, are required.

Benefits:

  • The compensation package includes a base salary that varies based on location, experience, and performance, along with equity, bonuses, and long-term incentives.
  • A progressive PTO policy is included in the benefits package.
  • Employees will have the opportunity to work on cutting-edge infrastructure that powers personalized shopping experiences for millions of users.
  • The role offers a chance to have a lasting impact and help shape the future of social commerce at Flip.shop.
Apply now
Please, let Flip know you found this job on RemoteYeah . This helps us grow 🌱.
About the job
Posted on
Job type
Salary
-
Report this job

Job expired or something else is wrong with this job?

Report this job
Leave a feedback