Real-World ML Failures: The Violence, Racism, and Sexism Uncaught by Twitter's Content Moderation Systems

Scott Heiner
Nov 9, 2022
Real-World ML Failures: The Violence, Racism, and Sexism Uncaught by Twitter's Content Moderation Systems

Here’s a tweet that’s been live on Twitter for nearly a month:

Here’s one that’s been up for 8 years.

Are Twitter’s content moderation systems any good?

Let’s look at examples of Twitter's content moderation failures, in order to understand how much it needs to improve — and to see the types of unhealthy content that will grow even bigger if its systems completely dissolve. To find these, we asked Surgers on our platform to collect examples of hateful speech, and – crucially! – checked that users on both sides of the political spectrum agreed.

Example Failures

It's worth noting that many of these tweets have been up for months. Many more can be easily found in minutes.

Toxic Replies

That said, without knowing the exact details of Twitter’s safety rules (although they’re spelled out at a high level here), it’s possible that some of these tweets aren’t actual violations, so wouldn’t be removed even if Twitter could detect them.

However, it’s known that Twitter downranks and limits the visibility of “toxic” tweets, even when they don’t rise to the level of breaking its rules. In particular, it hides replies it deems toxic behind two “Show More” sections at the bottom of the Tweet Replies page, which means we can see which tweets its AI models do and don't classify as toxic.

So – ignoring whether toxic replies should be downranked or not – how do its systems perform?

Here, once again, are replies that Surgers on both sides of the political spectrum deemed toxic and low-quality, but Twitter’s algorithms failed to detect.

One important point to note, from an ML standpoint, is that Twitter's systems don't appear to apply sophisticated pre-processing to understand word variations, like GYF (go fuck yourself) or L O S E R.

Once again, these examples have been up for weeks and months, were remarkably easy to find, and were agreed to be low-quality toxic content by both sides of the political spectrum.

In a future post, we’ll look at perfectly healthy tweets that Twitter is overzealous about downranking – i.e., tweets that add meaningful discussion to conversations, but Twitter nonetheless erroneously hides.

Interested in learning more about content moderation and the health of social media platforms? At Surge AI, we love language – all those rich, dirty little nuances that make understanding it the pinnacle of artificial (and human!) intelligence. That’s why we work with the top companies in the world to help them solve their content moderation and safety issues, with all the subtleties that entails.

Check out our other blog posts to learn more!

Scott Heiner

Scott Heiner

Scott runs Business Development and Operations at Surge AI, helping customers get the high-quality human-powered data they need to train and measure their AI. Before joining Surge, he led operations and marketing teams in the media industry.

surge ai logo

Data Labeling 2.0 for Rich, Creative AI

Superintelligent AI, meet your human teachers. Our data labeling platform is designed from the ground up to train the next generation of AI — whether it’s systems that can code in Python, summarize poetry, or detect the subtleties of toxic speech. Use our powerful data labeling workforce and tools to build the rich, human-powered datasets you need today.

Meet the world's largest
RLHF platform

Follow Surge AI!