Remote Senior AI Infra Engineer, AI/ML and Data Infrastructure

Posted

This job is closed

This job post is closed and the position is probably filled. Please do not apply.  Automatically closed by a robot after apply link was detected as broken.

Description:

  • The Chan Zuckerberg Initiative is seeking a Senior AI Infra Engineer to work on AI/ML and Data Infrastructure.
  • The role involves designing, building, and scaling software systems to assist educators, scientists, and policy experts in addressing various challenges.
  • The engineer will collaborate with team members to develop efficient, stable, performant, scalable, and secure AI/ML and Data infrastructure solutions.
  • Responsibilities include active hands-on coding for Deep Learning and Machine Learning models, integrating complex systems with large-scale AI/ML GPU compute infrastructure, and working on heterogeneous and distributed AI/ML environments.
  • The engineer will also participate in designing and building Cloud-based AI/ML platform solutions, collaborating on data management solutions, and developing tooling to empower AI/ML efforts with GPU Compute Cluster and other compute environments.

Requirements:

  • A BS or MS degree in Computer Science or a related technical discipline or equivalent experience is required.
  • Candidates should have 5+ years of relevant coding experience and 3+ years of systems Architecture and Design experience across Data, AI/ML, Core Infrastructure, and Security Engineering.
  • Proficiency in scaling containerized applications on Kubernetes or Mesos, along with expertise in creating custom containers and continuous deployment systems.
  • Experience with Amazon Web Services (AWS), Google Cloud Platform (GCP), or Microsoft Azure, and knowledge of On-Prem and Colocation Service hosting environments is necessary.
  • Strong coding ability in systems languages like Rust, C/C++, C#, Go, Java, or Scala, and proficiency in scripting languages such as Python, PHP, or Ruby is required.
  • Familiarity with AI/ML Platform Operations, large scale Kafka and Spark deployments, Workflow scheduling tools, Nvidia CUDA, Linux systems optimization, Data Engineering, Data Governance, and AI/ML execution platforms is essential.
  • Experience with PyTorch, Karas, or Tensorflow, and HPC with Slurm is a plus.

Benefits:

  • The base pay range for this role in Redwood City, CA is $190,000 - $285,000, with opportunities for growth over time.
  • CZI offers a generous employer match on employee 401(k) contributions, annual benefits for employees, CZI Life of Service Gifts, paid time off to volunteer, funding for family-forming benefits, and relocation support for employees moving to the Bay Area.
  • Additional benefits include a commitment to diversity, equity, and inclusion efforts, fair treatment, equal access to opportunity, and a workplace where everyone feels welcomed, respected, supported, and valued.
About the job
Posted on
Job type
Salary
$ 190,000 - 285,000 USD / year
Location requirements

-

Leave a feedback