Remote Senior AI Infra Engineer, AI/ML and Data Infrastructure
Posted
This job is closed
This job post is closed and the position is probably filled. Please do not apply.
🤖 Automatically closed by a robot after apply link
was detected as broken.
Description:
The Chan Zuckerberg Initiative is seeking a Senior AI Infra Engineer to work on AI/ML and Data Infrastructure.
The role involves designing, building, and scaling software systems to assist educators, scientists, and policy experts in addressing various challenges.
The engineer will collaborate with team members to develop efficient, stable, performant, scalable, and secure AI/ML and Data infrastructure solutions.
Responsibilities include active hands-on coding for Deep Learning and Machine Learning models, integrating complex systems with large-scale AI/ML GPU compute infrastructure, and working on heterogeneous and distributed AI/ML environments.
The engineer will also participate in designing and building Cloud-based AI/ML platform solutions, collaborating on data management solutions, and developing tooling to empower AI/ML efforts with GPU Compute Cluster and other compute environments.
Requirements:
A BS or MS degree in Computer Science or a related technical discipline or equivalent experience is required.
Candidates should have 5+ years of relevant coding experience and 3+ years of systems Architecture and Design experience across Data, AI/ML, Core Infrastructure, and Security Engineering.
Proficiency in scaling containerized applications on Kubernetes or Mesos, along with expertise in creating custom containers and continuous deployment systems.
Experience with Amazon Web Services (AWS), Google Cloud Platform (GCP), or Microsoft Azure, and knowledge of On-Prem and Colocation Service hosting environments is necessary.
Strong coding ability in systems languages like Rust, C/C++, C#, Go, Java, or Scala, and proficiency in scripting languages such as Python, PHP, or Ruby is required.
Familiarity with AI/ML Platform Operations, large scale Kafka and Spark deployments, Workflow scheduling tools, Nvidia CUDA, Linux systems optimization, Data Engineering, Data Governance, and AI/ML execution platforms is essential.
Experience with PyTorch, Karas, or Tensorflow, and HPC with Slurm is a plus.
Benefits:
The base pay range for this role in Redwood City, CA is $190,000 - $285,000, with opportunities for growth over time.
CZI offers a generous employer match on employee 401(k) contributions, annual benefits for employees, CZI Life of Service Gifts, paid time off to volunteer, funding for family-forming benefits, and relocation support for employees moving to the Bay Area.
Additional benefits include a commitment to diversity, equity, and inclusion efforts, fair treatment, equal access to opportunity, and a workplace where everyone feels welcomed, respected, supported, and valued.