The World's Best Social Media Toxicity Dataset

Saving the internet is fun. Combing through thousands of online comments to build a toxicity dataset isn't. That's why we're creating the world's largest dataset of social media toxicity — so you can skip the slog and get to work.
Get it for free now!

The dataset

This dataset contains 1000+ online comments from a range of social media platforms. The comments are divided into two categories — toxic and non-toxic.
a diagram with three bubbles saying 1000+ popular english profanities, 10 categories, 20+ languages
Built with an

Elite workforce

Surge AI is a data labeling platform and workforce. Our labeling team pored over tens of thousands of social media comments to build this toxicity dataset. Each comment was then evaluated by multiple members of our team to determine its severity level.

Get it right now!

This repo contains 1000+ comments from popular social media platforms.
Download
Thank you! You’ll receive the dataset shortly.
Oops! Something went wrong while submitting the form.

Love language?
So do we.

We're a team of engineers, researchers, and linguists from Google, Facebook, Harvard, and MIT. We started Surge AI to build the infrastructure to power NLP.

We work with companies at the forefront of AI to solve their core machine learning and language problems — from detecting hate speech, to parsing complex military documents, to injecting human values into the next wave of language models.

Our team comes from