Remote Machine Learning Engineer, Infrastructure at Rad AI

Description:

Design, implement, and maintain the infrastructure supporting machine learning applications, services, and workflows
Build and enhance the ML platform for continuous integration, delivery, and training of models
Utilize low-level programming languages, cloud native services, and serverless architectures for scalable systems
Develop components in the data pipeline to enable various machine learning models in production
Lead infrastructure projects, including technical designs, plans, and specifications
Design, deploy, and maintain the full ML platform stack with monitoring and data observability
Identify bottlenecks in the pipeline, optimize throughput and latency of ML components
Develop automation tools for model training and deployment

4 years of experience in ML Systems Engineering
4 years of industry experience in Python or other common ML languages
Proficiency in infrastructure and DevOps tools like Kubernetes, Docker, and Ansible
Knowledge of distributed systems, storage systems, and databases
Familiarity with cloud computing platforms such as AWS, GCP, and Azure
Experience with infrastructure-as-code tools like Terraform, Pulumi, Cloud Formation
Proficiency in monitoring tools like Cloudwatch, NewRelic, Prometheus
Strong communication skills, problem-solving approach, and sense of ownership
Ability to manage and lead active incidents and conduct blameless postmortems