Free Human-Labeled Datasets

Lovingly annotated by the Surge AI data labeling workforce, for your wildest data needs — including hate speech and content moderation datasets, stock market and financial transaction datasets, NSFW datasets, and more, in 30+ languages.
‍

Need a custom dataset and don't see it here? Reach out to team@surgehq.ai!

RLHF Dataset for Reinforcement Learning with Human Feedback

Build state-of-the-art AI by training your large language models on human feedback.

Download it today

InstructGPT-style Dataset

Build state-of-the-art large language models, in the style of InstructGPT and ChatGPT.

Download it today

Profanity Dataset

Need a list of profanities, and can't dream up enough on your own? We have you covered. Get the world's best profanity dataset for free now.

Toxicity Dataset

The world's largest dataset of social media toxicity — hateful speech across Twitter, Facebook, YouTube, Reddit, and more.

Download it today

Hate Speech Dataset

A dataset of hate speech from across the Internet.

Cover your eyes and download

Sentiment Analysis Dataset

1000+ customer reviews, social media posts, and more, classified by sentiment.

Download it today

French Profanity List

A dataset of thousands of French profanities, insults, and curse words, so that you can keep your platform safe.

Download it today

Spanish Hate Speech Dataset

A collection of Spanish hate speech texts

Download it today

Financial Transactions Dataset

A dataset of financial transactions, classified by intent and financial category.

Download it today

Twitter Hate Speech Dataset

A collection of hate speech tweets on Twitter.

Download it today

Stock Sentiment Analysis Dataset

1000 stock market tweets, labeled with their sentiment towards a publicly traded stock.

Take me to the moon

Resumes and Job Categorization Dataset

A dataset of resumes, classified with job title, category, and more.

Download it today

Search Evaluation Dataset

This search evaluation dataset contains search queries, the intent behind each search query, result URLs, and a human-evaluated search quality rating.

Download it today

Crypto Sentiment Analysis Dataset

1000 Reddit comments about Crypto, labeled with Positive or Negative sentiment.

My diamond hands are ready

Japanese Profanity List

A dataset of thousands of Japanese profanities, insults, and curse words, so that you can keep your platform safe.

Download it today

Credit Card Transactions Dataset

A collection of credit card transactions, classified by intent and financial category.

Download it today

Google Search Quality Dataset

This Google Search Quality dataset contains search queries, intents, result URLs, and a human-evaluated rating.

Download it today

Twitter Sentiment Analysis Dataset

1000+ tweets, classified by sentiment.

Download it today

Japanese Hate Speech, Insults, and Toxicity Dataset

A dataset of online comments in Japanese that contain hate speech, insults, and toxicity.

Download it today

Arabic Hate Speech Dataset

A dataset of Arabic hate speech texts.

Download it today

Abortion Tweets Dataset

A collection of tweets, labeled with their stance on abortion and Roe v. Wade.

Download it today

Brand Sentiment Analysis Dataset

Ditch NPS for good; understand real user sentiment with this dataset of 1000 labeled, online conversations.

Get the dataset

Dataset of Search Queries and Intents

This dataset contains search queries, as well as the user's intent when performing the search query.

Download it today

Fake News Dataset

A dataset of social media posts containing fake news.

Download it today

Facebook Misinformation Dataset

A dataset of Facebook posts containing misinformation.

Download it today

Facebook Hate Speech Dataset

A collection of hate speech posts on Facebook.

Download it today

Email Spam Dataset

A dataset of real Spam and Not Spam emails, including whether or not they were caught by Gmail's spam filters.

Download it today

German Profanity List

A dataset of thousands of German profanities, insults, and curse words, so that you can keep your platform safe.

Download it today

Arabic Profanity List

A dataset of thousands of Arabic profanities, insults, and curse words, so that you can keep your platform safe.

Download it today

Spanish Profanity List

A dataset of thousands of Spanish profanities, insults, and curse words, so that you can keep your platform safe.

Download it today

Question-Answering Dataset

A dataset of questions about real webpages, news articles, and pieces of text, along with their associated answers.

Download it today

Other Resources

Manifold

Have you ever wondered how your data is shaped? Explore your datasets in their embedding space with our interactive visualizations.

All Datasets

4

Toxicity

4

InstructGPT-style Dataset

Build state-of-the-art large language models, in the style of InstructGPT and ChatGPT.

RLHF (Reinforcement Learning with Human Feedback) Dataset

Build state-of-the-art AI by training your large language models on human feedback.

Japanese Hate Speech, Insults, and Toxicity Dataset

A dataset of online comments in Japanese that contain hate speech, insults, and toxicity.

Dataset of Search Queries and Intents

This dataset contains search queries, as well as the user's intent when performing the search query.

Google Search Quality Dataset

This Google Search Quality dataset contains search queries, intents, result URLs, and a human-evaluated rating.

Search Evaluation Dataset

This search evaluation dataset contains search queries, the intent behind each search query, result URLs, and a human-evaluated search quality rating.

Twitter Sentiment Analysis Dataset

1000+ tweets, classified by sentiment.

Email Spam Dataset

A dataset of real Spam and Not Spam emails, including whether or not they were caught by Gmail's spam filters.

Fake News Dataset

A dataset of social media posts containing fake news.

Resumes and Job Categorization Dataset

A dataset of resumes, classified with job title, category, and more.

Financial Transactions Dataset

A dataset of financial transactions, classified by intent and financial category.

Sentiment Analysis Dataset

1000+ customer reviews, social media posts, and more, classified by sentiment.

Arabic Hate Speech Dataset

A dataset of Arabic hate speech texts.

Spanish Hate Speech Dataset

A collection of Spanish hate speech texts

Japanese Profanity List

A dataset of thousands of Japanese profanities, insults, and curse words, so that you can keep your platform safe.

Arabic Profanity List

A dataset of thousands of Arabic profanities, insults, and curse words, so that you can keep your platform safe.

German Profanity List

A dataset of thousands of German profanities, insults, and curse words, so that you can keep your platform safe.

French Profanity List

A dataset of thousands of French profanities, insults, and curse words, so that you can keep your platform safe.

Spanish Profanity List

A dataset of thousands of Spanish profanities, insults, and curse words, so that you can keep your platform safe.

Facebook Hate Speech Dataset

A collection of hate speech posts on Facebook.

Twitter Hate Speech Dataset

A collection of hate speech tweets on Twitter.

Abortion Tweets Dataset

A collection of tweets, labeled with their stance on abortion and Roe v. Wade.

Hate Speech Dataset

A dataset of hate speech from across the Internet.

Credit Card Transactions Dataset

A collection of credit card transactions, classified by intent and financial category.

Facebook Misinformation Dataset

A dataset of Facebook posts containing misinformation.

Question-Answering Dataset

A dataset of questions about real webpages, news articles, and pieces of text, along with their associated answers.

Brand Sentiment Dataset

Ditch NPS for good; understand real user sentiment with this dataset of 1000 labeled, online conversations.

Crypto Sentiment Dataset

1000 Reddit comments about Crypto, labeled with Positive or Negative sentiment.

Stock Sentiment Dataset

1000 stock market tweets, labeled with their sentiment towards a publicly traded stock.

The World's Best Toxicity Dataset

The world's largest dataset of social media toxicity — hateful speech across Twitter, Facebook, YouTube, Reddit, and more.

The World's Largest Profanity List

Need a list of profanities, and can't dream up enough on your own? We have you covered. Get the world's best profanity dataset for free now.

Sentiment

4

InstructGPT-style Dataset

Build state-of-the-art large language models, in the style of InstructGPT and ChatGPT.

RLHF (Reinforcement Learning with Human Feedback) Dataset

Build state-of-the-art AI by training your large language models on human feedback.

Japanese Hate Speech, Insults, and Toxicity Dataset

A dataset of online comments in Japanese that contain hate speech, insults, and toxicity.

Dataset of Search Queries and Intents

This dataset contains search queries, as well as the user's intent when performing the search query.

Google Search Quality Dataset

This Google Search Quality dataset contains search queries, intents, result URLs, and a human-evaluated rating.

Search Evaluation Dataset

This search evaluation dataset contains search queries, the intent behind each search query, result URLs, and a human-evaluated search quality rating.

Twitter Sentiment Analysis Dataset

1000+ tweets, classified by sentiment.

Email Spam Dataset

A dataset of real Spam and Not Spam emails, including whether or not they were caught by Gmail's spam filters.

Fake News Dataset

A dataset of social media posts containing fake news.

Resumes and Job Categorization Dataset

A dataset of resumes, classified with job title, category, and more.

Financial Transactions Dataset

A dataset of financial transactions, classified by intent and financial category.

Sentiment Analysis Dataset

1000+ customer reviews, social media posts, and more, classified by sentiment.

Arabic Hate Speech Dataset

A dataset of Arabic hate speech texts.

Spanish Hate Speech Dataset

A collection of Spanish hate speech texts

Japanese Profanity List

A dataset of thousands of Japanese profanities, insults, and curse words, so that you can keep your platform safe.

Arabic Profanity List

A dataset of thousands of Arabic profanities, insults, and curse words, so that you can keep your platform safe.

German Profanity List

A dataset of thousands of German profanities, insults, and curse words, so that you can keep your platform safe.

French Profanity List

A dataset of thousands of French profanities, insults, and curse words, so that you can keep your platform safe.

Spanish Profanity List

A dataset of thousands of Spanish profanities, insults, and curse words, so that you can keep your platform safe.

Facebook Hate Speech Dataset

A collection of hate speech posts on Facebook.

Twitter Hate Speech Dataset

A collection of hate speech tweets on Twitter.

Abortion Tweets Dataset

A collection of tweets, labeled with their stance on abortion and Roe v. Wade.

Hate Speech Dataset

A dataset of hate speech from across the Internet.

Credit Card Transactions Dataset

A collection of credit card transactions, classified by intent and financial category.

Facebook Misinformation Dataset

A dataset of Facebook posts containing misinformation.

Question-Answering Dataset

A dataset of questions about real webpages, news articles, and pieces of text, along with their associated answers.

Brand Sentiment Dataset

Ditch NPS for good; understand real user sentiment with this dataset of 1000 labeled, online conversations.

Crypto Sentiment Dataset

1000 Reddit comments about Crypto, labeled with Positive or Negative sentiment.

Stock Sentiment Dataset

1000 stock market tweets, labeled with their sentiment towards a publicly traded stock.

The World's Best Toxicity Dataset

The world's largest dataset of social media toxicity — hateful speech across Twitter, Facebook, YouTube, Reddit, and more.

The World's Largest Profanity List

Need a list of profanities, and can't dream up enough on your own? We have you covered. Get the world's best profanity dataset for free now.

Third option

4

InstructGPT-style Dataset

Build state-of-the-art large language models, in the style of InstructGPT and ChatGPT.

RLHF (Reinforcement Learning with Human Feedback) Dataset

Build state-of-the-art AI by training your large language models on human feedback.

Japanese Hate Speech, Insults, and Toxicity Dataset

A dataset of online comments in Japanese that contain hate speech, insults, and toxicity.

Dataset of Search Queries and Intents

This dataset contains search queries, as well as the user's intent when performing the search query.

Google Search Quality Dataset

This Google Search Quality dataset contains search queries, intents, result URLs, and a human-evaluated rating.

Search Evaluation Dataset

This search evaluation dataset contains search queries, the intent behind each search query, result URLs, and a human-evaluated search quality rating.

Twitter Sentiment Analysis Dataset

1000+ tweets, classified by sentiment.

Email Spam Dataset

A dataset of real Spam and Not Spam emails, including whether or not they were caught by Gmail's spam filters.

Fake News Dataset

A dataset of social media posts containing fake news.

Resumes and Job Categorization Dataset

A dataset of resumes, classified with job title, category, and more.

Financial Transactions Dataset

A dataset of financial transactions, classified by intent and financial category.

Sentiment Analysis Dataset

1000+ customer reviews, social media posts, and more, classified by sentiment.

Arabic Hate Speech Dataset

A dataset of Arabic hate speech texts.

Spanish Hate Speech Dataset

A collection of Spanish hate speech texts

Japanese Profanity List

A dataset of thousands of Japanese profanities, insults, and curse words, so that you can keep your platform safe.

Arabic Profanity List

A dataset of thousands of Arabic profanities, insults, and curse words, so that you can keep your platform safe.

German Profanity List

A dataset of thousands of German profanities, insults, and curse words, so that you can keep your platform safe.

French Profanity List

A dataset of thousands of French profanities, insults, and curse words, so that you can keep your platform safe.

Spanish Profanity List

A dataset of thousands of Spanish profanities, insults, and curse words, so that you can keep your platform safe.

Facebook Hate Speech Dataset

A collection of hate speech posts on Facebook.

Twitter Hate Speech Dataset

A collection of hate speech tweets on Twitter.

Abortion Tweets Dataset

A collection of tweets, labeled with their stance on abortion and Roe v. Wade.

Hate Speech Dataset

A dataset of hate speech from across the Internet.

Credit Card Transactions Dataset

A collection of credit card transactions, classified by intent and financial category.

Facebook Misinformation Dataset

A dataset of Facebook posts containing misinformation.

Question-Answering Dataset

A dataset of questions about real webpages, news articles, and pieces of text, along with their associated answers.

Brand Sentiment Dataset

Ditch NPS for good; understand real user sentiment with this dataset of 1000 labeled, online conversations.

Crypto Sentiment Dataset

1000 Reddit comments about Crypto, labeled with Positive or Negative sentiment.

Stock Sentiment Dataset

1000 stock market tweets, labeled with their sentiment towards a publicly traded stock.

The World's Best Toxicity Dataset

The world's largest dataset of social media toxicity — hateful speech across Twitter, Facebook, YouTube, Reddit, and more.

The World's Largest Profanity List

Need a list of profanities, and can't dream up enough on your own? We have you covered. Get the world's best profanity dataset for free now.

Brought to you by Surge AI

The world's highest-quality data labeling platform. We unify sophisticated labelers with the powerful tools you need to build next-gen artificial intelligence and machine learning models. Learn about some of the common pitfalls in data labeling we avoid to bring you the best data possible.