Hugging Face
Remote (Île-de-France, France)
Here at Hugging Face, we’re on a journey to advance good Machine Learning and make it more accessible. Along the way, we contribute to the development of technology for the better. We have built the fastest-growing, open-source, library of pre-trained models in the world. With over 100M+ installs and 65K+ stars on GitHub, over 10 thousand companies are using HF technology in production, including leading AI organizations such as Google, Elastic, Salesforce, Algolia, and Grammarly. About the role During this internship, you will work with the open-science team at Hugging Face to develop a framework for creating, evaluating, and visualizing datasets for large language models, with text and text-like data (code, mathematics…) but also other modalities paired with text (images and text, for instance). You’ll integrate with and contribute to open source libraries, such as Datasets or Megatron-LM, and create new tools dedicated to getting the best possible data for large model...