Hugging Face Logo

Hugging Face

ML Research Engineer Internship, FineWeb - US Remote

Job Posted 9 Days Ago Posted 9 Days Ago
Remote
Hiring Remotely in United States
Internship
Remote
Hiring Remotely in United States
Internship
As an ML Research Engineer Intern, you'll build high-quality web data and contribute to running distributed data processing while enhancing data quality with small models.
The summary above was generated by AI

Description

At Hugging Face, we’re on a journey to democratize good AI. We are building the fastest growing platform for AI builders with over 5 million users & 100k organizations who collectively shared over 1M models, 300k datasets & 300k apps. Our open-source libraries have more than 400k+ stars on Github.

About the Role

High-quality datasets are the foundation of strong LLMs, yet, most labs releasing state-of-the-art models are vague when it comes to the pretraining data. At Hugging Face we want to enable all the community to build the best models by building and open-sourcing the finest datasets. FineWeb and FineWeb-Edu are examples of very strong, web-scale datasets we released this year while also open-sourcing the distributed processing library datatrove.

During this internship you will work alongside the FineWeb team and build the next generation of high-quality web data, by running distributed data processing and ablating the data quality by training small models. Checkout hf.co/science for more information about the science team at Hugging Face and the and blog posts for the work of this team specifically.

About You

If you love open-source but also have an eye for art and creativity, are passionate about making complex technology more accessible to engineers and artists, and want to contribute to one of the fastest-growing ML ecosystems, then we can't wait to see your application!

If you're interested in joining us, but don't tick every box above, we still encourage you to apply! We're building a diverse team whose skills, experiences, and background complement one another. We're happy to consider where you might be able to make the biggest impact.

More about Hugging Face

We are actively working to build a culture that values diversity, equity, and inclusivity. We are intentionally building a workplace where people feel respected and supported—regardless of who you are or where you come from. We believe this is foundational to building a great company and community. Hugging Face is an equal opportunity employer and we do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

We value development. You will work with some of the smartest people in our industry. We are an organization that has a bias for impact and is always challenging ourselves to continuously grow. We provide all employees with reimbursement for relevant conferences, training, and education.

We care about your well-being. We offer flexible working hours and remote options. We support our employees wherever they are. While we have office spaces around the world, especially in the US, Canada, and Europe, we're very distributed and all remote employees have the opportunity to visit our offices. If needed, we'll also outfit your workstation to ensure you succeed.

We support the community. We believe significant scientific advancements are the result of collaboration across the field. Join a community supporting the ML/AI community.

Requirements

Please provide a cover letter mentioning why you would like to work in open-source at Hugging Face. We encourage you to mention your skills, potential expertise, and topics on which you would like to work.

Top Skills

Data Processing
Ml
Web-Scale Datasets

Similar Jobs

2 Hours Ago
Remote
United States
120K-165K Annually
Junior
120K-165K Annually
Junior
Consumer Web • Digital Media • Information Technology • News + Entertainment • Social Media
Design and develop scalable web applications using Python and Django, focusing on robust solutions for financial challenges while collaborating with a small team.
Top Skills: CSSDjangoDjango Rest FrameworkPythonReactVersion Control
2 Hours Ago
Easy Apply
Remote
USA
Easy Apply
Mid level
Mid level
Fintech • Social Impact
The Front-End Platform Engineer will maintain and enhance the donation process, collaborate on design and product features, and provide technical leadership.
Top Skills: BugsnagCiCypressDatadogDockerEslintGitopsGraphQLHerokuJavaScriptJestKubernetesLookerMixpanelNode.jsPrettierReactRtlRuby On RailsTypescriptWebpackYarn
3 Hours Ago
Remote
Hybrid
2 Locations
133K-167K Annually
Senior level
133K-167K Annually
Senior level
Cloud • Fintech • Information Technology • Machine Learning • Software • App development • Generative AI
As a Senior Site Reliability Engineer, you'll develop cloud-based data platforms using GCP, support data pipeline construction, and improve data management practices while collaborating with various teams.
Top Skills: Apache HadoopDataprocGitGoogle Cloud PlatformKafkaPy-SparkPythonRest ApiSparkSQL

What you need to know about the Seattle Tech Scene

Home to tech titans like Microsoft and Amazon, Seattle punches far above its weight in innovation. But its surrounding mountains, sprinkled with world-famous hiking trails and climbing routes, make the city a destination for outdoorsy types as well. Established as a logging town before shifting to shipbuilding and logistics, the Emerald City is now known for its contributions to aerospace, software, biotech and cloud computing. And its status as a thriving tech ecosystem is attracting out-of-town companies looking to establish new tech and engineering hubs.

Key Facts About Seattle Tech

  • Number of Tech Workers: 287,000; 13% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Amazon, Microsoft, Meta, Google
  • Key Industries: Artificial intelligence, cloud computing, software, biotechnology, game development
  • Funding Landscape: $3.1 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Madrona, Fuse, Tola, Maveron
  • Research Centers and Universities: University of Washington, Seattle University, Seattle Pacific University, Allen Institute for Brain Science, Bill & Melinda Gates Foundation, Seattle Children’s Research Institute
By clicking Apply you agree to share your profile information with the hiring company.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account