Description:

Flip.shop is seeking a Senior Machine Learning Engineer - Machine Learning Infrastructure to design, build, and optimize the infrastructure that powers their machine learning systems.
The role involves ensuring the efficient deployment, scaling, and monitoring of machine learning models, and streamlining the development lifecycle.
Responsibilities include designing and implementing scalable infrastructure for deploying, monitoring, and maintaining machine learning models in production environments, specifically for feeds, ads, and search ranking models.
The engineer will optimize the serving and training infrastructure of machine learning models and enhance workflows for model training, serving, data pipelines, storage systems, and resource management within multi-tenancy machine learning systems.
The position requires building tools to automate workflows for model training, testing, and deployment, ensuring quick transitions from development to production.
Performance optimization is crucial, focusing on minimizing latency and maximizing throughput for high-performance model inference at scale.
Collaboration with data scientists, machine learning engineers, and DevOps teams is essential to create seamless integration between development and production environments.
The engineer will build robust monitoring systems to track model performance and infrastructure health, ensuring reliability and uptime of machine learning services.
Implementing best practices in infrastructure security, data privacy, and compliance, particularly when handling sensitive user data, is also a key responsibility.

Requirements:

A Bachelor's degree or higher in Computer Science or a related field is required, along with 3+ years of experience in building scalable systems.
Proficiency in one or two programming languages (C/C++, Golang) within a Linux environment is necessary.
A solid understanding of GPU hardware architecture, GPU software stack (CUDA, cuDNN), and experience in GPU performance analysis is required.
Experience in deep model inference/training, debugging, and tuning is essential.
Familiarity with mainstream machine learning frameworks (e.g., TensorFlow, PyTorch, MxNet) is expected.
Knowledge of MLOps practices is required.
Experience with big data frameworks (e.g., Spark, Hadoop, Flink) and resource management and task scheduling for large-scale distributed systems is necessary.
Experience in using or designing open-source machine learning lifecycle management systems like TFX is preferred.
Excellent logical analysis and problem-solving skills, strong sense of responsibility, good learning ability, communication skills, and self-motivation are essential.
Good working document habits, with timely writing and updating of workflow and technical documentation, are required.

Benefits:

The compensation package includes a base salary that varies based on location, experience, and performance, along with equity, bonuses, and long-term incentives.
A progressive PTO policy is included in the benefits package.
Employees will have the opportunity to work on cutting-edge infrastructure that powers personalized shopping experiences for millions of users.
The role offers a chance to have a lasting impact and help shape the future of social commerce at Flip.shop.

Remote Senior Machine Learning Engineer - Machine Learning Infrastructure

Description:

Requirements:

Benefits: