Remote ML Research Engineer Internship, SmolLMs pretraining and datasets

Description:

Hugging Face is seeking an ML Research Engineer Intern to work on SmolLMs pretraining and datasets in a remote capacity.
The role involves collaborating with the SmolLM team to build the next generation of smol language models by iterating on datasets and models quickly.
Interns will utilize scalable CPU clusters for dataset processing and train models on a state-of-the-art H100 cluster with nearly 100 nodes.
The internship is ideal for individuals passionate about training large language models (LLMs) and building high-quality datasets.
Candidates should be proficient in Python and have an interest in contributing to the development of accessible machine learning technology.

Applicants must provide a cover letter explaining their interest in working in open-source at Hugging Face.
The cover letter should highlight relevant skills, potential expertise, and specific topics of interest for the internship.
A passion for making complex technology accessible to engineers and artists is essential.
Candidates should be open to applying even if they do not meet every requirement listed.

Hugging Face promotes a culture of diversity, equity, and inclusivity, ensuring a respectful and supportive workplace.
The company offers reimbursement for relevant conferences, training, and education to support employee development.
Flexible working hours and remote work options are available, with opportunities for remote employees to visit office spaces.
Workstations will be outfitted to ensure interns can succeed in their roles.
Interns will join a community that supports significant scientific advancements through collaboration in the ML/AI field.