Applied Research Internship - Datasets for large language models

  • Hugging Face
  • Remote (Île-de-France, France)
  • 03 Mar, 2023

Job Description

Here at Hugging Face, we’re on a journey to advance good Machine Learning and make it more accessible. Along the way, we contribute to the development of technology for the better.

We have built the fastest-growing, open-source, library of pre-trained models in the world. With over 100M+ installs and 65K+ stars on GitHub, over 10 thousand companies are using HF technology in production, including leading AI organizations such as Google, Elastic, Salesforce, Algolia, and Grammarly.

About the role

During this internship, you will work with the open-science team at Hugging Face to develop a framework for creating, evaluating, and visualizing datasets for large language models, with text and text-like data (code, mathematics…) but also other modalities paired with text (images and text, for instance). You’ll integrate with and contribute to open source libraries, such as Datasets or Megatron-LM, and create new tools dedicated to getting the best possible data for large model pretraining - the most impactful part of the whole pipeline.

At the intersection of open-science and open-source, this internship will have you interact with researchers from a thriving branch of science, as well as maintainers and users of one of the most active open-source ecosystems. We aspire to put you in a position to do your most impactful work.

About you

If you’re pragmatic, data-driven, and are looking for a high-impact project to push the state-of-the-art of the field, then we can't wait to see your application!

The ideal applicant should have:

  • Experience with Torch or another deep learning framework of choice
  • Experience with web data or with large (> 100GB) datasets in general
  • Scientific spirit and good communication skills

If you're interested in joining us, but don't tick every box above, we still encourage you to apply! We're building a diverse team whose skills, experiences, and background complement one another. We're happy to consider where you might be able to make the biggest impact.

More about Hugging Face

We are actively working to build a culture that values diversity, equity, and inclusivity. We are intentionally building a workplace where people feel respected and supported—regardless of who you are or where you come from. We believe this is foundational to building a great company and community. Hugging Face is an equal opportunity employer and we do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

We value development. You will work with some of the smartest people in our industry. We are an organization that has a bias for impact and is always challenging ourselves to continuously grow. We provide all employees with reimbursement for relevant conferences, training, and education.

We care about your well-being. We offer flexible working hours and remote options. We offer health, dental, and vision benefits for employees and their dependents. We also offer 12 weeks of parental leave (20 for birthing mothers) and unlimited paid time off.

We support our employees wherever they are. While we have office spaces in NYC and Paris, we're very distributed and all remote employees have the opportunity to visit our offices. If needed, we'll also outfit your workstation to ensure you succeed.

We want our teammates to be shareholders. All employees have company equity as part of their compensation package. If we succeed in becoming a category-defining platform in machine learning and artificial intelligence, everyone enjoys the upside.

We support the community. We believe major scientific advancements are the result of collaboration across the field. Join a community supporting the ML/AI community.