This job post is closed and the position is probably filled. Please do not apply.
π€ Automatically closed by a robot after apply link
was detected as broken.
Description:
The ML Ops Engineer will work on building shared tools and platforms for the Chan Zuckerberg Initiative, supporting Research Scientists, Data Scientists, AI Research Scientists, and Engineers in the Education and Science domains.
They will be responsible for MLOps and AI development projects, focusing on GPU Cloud Cluster operations and ensuring system stability throughout the AI lifecycle.
The role involves collaborating with AI Researchers, building model deployment automation, monitoring systems, and integrating MLFlow for model versioning and experiment tracking.
The engineer will optimize GPU platform and model training processes, work on Cloud-based AI/ML data platform solutions, and contribute to defining and implementing SRE style service level indicators.
They will also assist in troubleshooting and resolving issues on the Kubernetes-based GPU Cluster and participate in data management solutions for large-scale training datasets.
Requirements:
BS, MS, or PhD in Computer Science or related field, or equivalent experience.
Experience in MLOps with medium to large scale GPU clusters, Kubernetes, or HPC environments.
Proficiency in DevOps tooling for data and machine learning use cases, including containerized applications on Kubernetes or Mesos.
5+ years of coding experience with Python, PHP, or Ruby, and a systems language like Rust, C/C++, C#, Go, Java, or Scala.
Familiarity with data platform operations using tools like Kafka, Spark, and Airflow.
Knowledge of AWS, GCP, or Azure, and Linux systems optimization and administration.
Understanding of Data Engineering, Data Governance, Data Infrastructure, and AI/ML execution platforms.
Benefits:
CZI offers a generous employer match on employee 401(k) contributions and annual benefits that can be used for various needs.
Employees receive CZI Life of Service Gifts, paid time off for volunteering, and funding for family-forming benefits.
Relocation support is provided for those moving to the Bay Area.
The company is committed to diversity, equity, and inclusion efforts, ensuring fair treatment and equal access to opportunities for all team members.