The Obscenity List

Scott Heiner
Jan 8, 2022
The Obscenity List

Ever wish you had a ready-made list of profanity? Maybe you want to remove NSFW comments, filter offensive usernames, or build content moderation tools, and you can't dream up enough obscenities on your own. You’re in luck — Surge AI is creating the world's largest profanity dataset, in 20+ languages. Get it for free now. 

Dataset

The dataset contains 1600+ popular English profanities and their variations.

Columns

text: the profanity

canonical_form_1: the profanity's canonical form

canonical_form_2: an additional canonical form, if applicable

canonical_form_3: an additional canonical form, if applicable

category_1: the profanity's primary category (see below for list of categories)

category_2: the profanity's secondary category, if applicable

category_3: the profanity's tertiary category, if applicable

severity_rating: We asked 5 Surge AI data labelers to rate how severe they believed each profanity to be, on a 1-3 point scale. This is the mean of those 5 ratings.

severity_description: We rounded `severity_rating` to the nearest integer. `Mild` corresponds to a rounded mean rating of `1`, `Strong` to `2`, and `Severe` to `3`.

Categories

We organized the profanity into the following categories:

- sexual anatomy / sexual acts

- bodily fluids / excrement

- sexual orientation / gender

- racial / ethnic

- mental disability

- physical disability

- physical attributes

- animal references

- religious offense

- political

Looking forward...

We'll be adding more languages and profanity annotations over time.

Need a larger set of expletives and slurs, or a list of swear words in other languages (Spanish, French, German, Japanese, Portuguese, etc)? We love feedback. Reach out to team@surgehq.ai!


Surge AI is a data labeling workforce and platform that provides world-class data to top AI companies and researchers. Interested in $50 of free labels? Fill out our 30-second form and we'll get you started today!

surge ai logo

Data Labeling 2.0 for Rich, Creative AI

Superintelligent AI, meet your human teachers. Our data labeling platform is designed from the ground up to train the next generation of AI — whether it’s systems that can code in Python, summarize poetry, or detect the subtleties of toxic speech. Use our powerful data labeling workforce and tools to build the rich, human-powered datasets you need today.

Scott Heiner

Scott Heiner

Scott runs Business Development and Operations at Surge AI, helping customers get the high-quality human-powered data they need to train and measure their AI. Before joining Surge, he led operations and marketing teams in the media industry.

Data Labeling for the
Richness of AI

Build human-powered datasets using our global labeling workforce and platform.