Remote ML Research Engineer Internship, SmolLMs pretraining and datasets

Description:

Hugging Face is seeking an ML Research Engineer Intern to work on SmolLMs pretraining and datasets in a remote capacity for the EMEA region.
The role involves collaborating with the SmolLM team to build the next generation of smol language models by iterating on datasets and models quickly.
Interns will utilize scalable CPU clusters for dataset processing and train models on a state-of-the-art H100 cluster with nearly 100 nodes.
The internship is ideal for individuals passionate about training large language models (LLMs) and building high-quality datasets.
Candidates should be proficient in Python and have a strong interest in contributing to the development of accessible machine learning technology.

Applicants must provide a cover letter explaining their interest in working in open-source at Hugging Face.
The cover letter should highlight relevant skills, potential expertise, and specific topics of interest for the internship.
A passion for making complex technology accessible to engineers and artists is essential.
Candidates should be open to applying even if they do not meet every requirement listed.

Hugging Face promotes a culture of diversity, equity, and inclusivity, ensuring a respectful and supportive workplace for all employees.
The company values professional development, offering reimbursement for relevant conferences, training, and education.
Flexible working hours and remote work options are available to support employee well-being.
Employees have the opportunity to visit office spaces around the world, particularly in the US, Canada, and Europe.
Hugging Face is committed to supporting the ML/AI community through collaboration and shared advancements in the field.