Dataset of Roe v. Wade Tweets Labeled by Abortion Stance

Last week, the Supreme Court overturned Roe v. Wade, eliminating the constitutional right to an abortion.

What are people saying about this decision, and how are they reacting? Can we predict their sentiment?

To find out, we scraped 5000 tweets about abortion and Roe v. Wade, and labeled them according to their abortion stance (“pro-choice”, “pro-life”, “neutral”, “both”). If you’d like to play around yourself, download it for free here!

Here are some examples.

Analysis of the Abortion / Roe v. Wade Dataset

Overall, what is Twitter’s stance towards abortion and Roe v. Wade? ~72% of (non-news) tweets were pro-choice or dismayed by the Supreme Court’s decision, while ~28% were pro-life or celebratory.

Breakdown of Twitter sentiment towards Roe v. Wade

Were there any user-level features that could predict someone’s stance? We trained a machine learning model based on users’ profiles. Here were the most predictive features:

Pro-Life Predictive Features

Pro-life users were…

20.4x more likely to have christ in their bio
16.1x more likely to have maga in their bio
14.6x more likely to have jesus in their bio
12.4x more likely to have conservative in their bio
11.7x more likely to have blocked in their bio ("Blocked by snowflake losers who wet their pants.")

11.6x more likely to have 2a (shorthand for 2nd Amendment) in their bio ("Retired Army MP. Former State Corrections Officer. Pro 2A, Moderate Independent Conservative. I am who I am. Deal with it.")

11.5x more likely to have ultra (as in Ultra MAGA) in their bio
11.2x more likely to have kag (Keep America Great) in their bio
11.2x more likely to have grandfather in their bio ("Husband Father Grandfather Soldier American")

10.9x more likely to have identify (as a pronoun joke) in their bio ("Pro life mom 🐬✝️. Independent Voter. I identify as a threat. Try/Me.")

10.7x more likely to have ifbap (I follow back all patriots) in their bio
7.3x more likely to have catholic in their bio
7.3x more likely to have usa in their bio
6.8x more likely to have independent in their bio
5.8x more likely to have gift (usually in a religious context) in their bio ("#ChildofGod, mother, friend. Stumbled into a #memoir writers' workshop and have been stuck in the genre ever since. Life is a precious gift, so #prolife.")

5.8x more likely to have christian in their bio
5.8x more likely to have veteran in their bio
4.9x more likely to have god in their bio
4.4x more likely to have democrats in their bio ("Fuck democrats, their anti-American policies are bad for our country")

4.4x more likely to have politicians in their bio ("Conservative. ❤ the USA. MAGA!! Swamp Politicians must go. Term limits!")

4.4x more likely to have government in their bio
4.3x more likely to have gettr in their bio
4.2x more likely to have freedom in their bio
4.1x more likely to have born in their bio
3.9x more likely to have patriot in their bio
3.9x more likely to have church in their bio
3.6x more likely to have army in their bio
3.4x more likely to have trump in their bio
3.3x more likely to have father in their bio
3.2x more likely to have truth in their bio
3.1x more likely to have wife in their bio
2.9x more likely to have retired in their bio
2.9x more likely to have mother in their bio
2.9x more likely to have republican in their bio

Pro-Choice Predictive Features

Pro-choice users were…

7.5x more likely to have blm in their bio
6.5x more likely to have she/her in their bio
6.5x more likely to have democracy in their bio
5.1x more likely to have politics in their bio
4.5x more likely to have liberal in their bio
4.1x more likely to have author in their bio
3.8x more likely to have blue in their bio
3.4x more likely to have resist in their bio
3.4x more likely to have democrat in their bio
3.1x more likely to have pro-choice in their bio
3.1x more likely to have vote in their bio
3.1x more likely to have editor in their bio
2.7x more likely to have reading in their bio
2.7x more likely to have white in their bio ("they/them • 17 • white")

2.7x more likely to have atheist in their bio
2.4x more likely to have animal in their bio ("Lover of animals, food, cooking, family, friends. Former R. Trying to be nicer 😣 ")

2.4x more likely to have tech in their bio ("Smart domestic policy research. Experts in: governance, tech, education, national security, U.S. politics, elections, and institutional reform @BrookingsInst.")

2.4x more likely to have streamer in their bio
3.4x more likely to have he/him in their bio
2.1x more likely to have she in their bio
2.1x more likely to have stop in their bio ("Get religion out of govt, stop telling women what to do with their bodies and pray to whatever God you believe in. Let's get out of other's business.")

2.1x more likely to have lgbtq in their bio
2.1x more likely to have peace in their bio ("I want peace in the 🌎 Hopefully America comes to their senses and vote with their good heart and mind, Democracy, integrity, and truth😷✌💙")

2.1x more likely to have they/them in their bio
2.1x more likely to have survivor in their bio
2.1x more likely to have justice in their bio
2.1x more likely to have she/they in their bio
2.1x more likely to have student in their bio
1.7x more likely to have adhd in their bio ("🏆Live Event Host🃏Comedian🎙I talk in to microphones- yes, I have a podcast 👉 @CUULPodcast. Adult living w ADHD | #KeepGoing (he/him)")

1.7x more likely to have equality in their bio
1.7x more likely to have society in their bio
1.7x more likely to have jewish in their bio
1.7x more likely to have 2022 in their bio ("Vote Blue in 2022 to stop GOPs from cutting SS, gutting Medicare and Medicaid, overturning ACA, and raising taxes on over 1/2 of the American people.")

1.7x more likely to have ukraine in their bio

Some interesting differences:

Pro-life users are more likely to say politicians and government, usually in the context of corruption (Conservative. ❤ the USA. MAGA!! Swamp Politicians must go. Term limits!). Pro-choice users are more likely to say politics and political, usually in the context of an intellectual interest or movement (politics, film & caffeine (she/her))
Pro-life users are more likely to mention their age and family relationship (grandfather, father, mother, wife, retired, born), while pro-choice users are more likely to mention their pronouns, race, sexuality, and career (she/her, author, lgbtq, tech, white).
Pro-life users are more likely to say democrats (in a negative connotation), while pro-choice users are more likely to say democrat (as a self-identification).
Pro-life users care about freedom and truth. Pro-choice users care about justice and equality.

It’s also interesting to look at users who are strongly predicted to have one stance, but have another. For example, this user is predicted to be Pro-Life…

…but is actually against the overturning of Roe v. Wade.

How We Labeled the Dataset

Labeling this dataset could be surprisingly tricky – especially if you’re unfamiliar with US politics and the nuances of Roe v. Wade.

For example: if you’re not a fluent English speaker familiar with US culture, would you recognize the sarcasm in this tweet?

For example: If you’re a data labeler based in India or the Philippines, it’s unlikely you’ll know who “RBG” refers to or what her stance on abortion was, and you’ll mislabel many tweets as a result.

That’s why we used a team of skilled, US-based Surgers fluent in social media and politics (e.g., who also work on our projects with NYU’s Center for Social Media and Politics) to create this dataset.

Here's a screenshot of the labeling UI that Surgers used to categorize tweets.

If you’d like to explore the dataset yourself, download it for free. And if you have ideas on expanding the dataset or making it more useful, follow us on Twitter at @HelloSurgeAI or reach out at hello@surgehq.ai!

Bradley Webb

Bradley runs Surge AI's Product and Growth teams. He previously led Integrity and Data Operations teams at Facebook, and graduated from Dartmouth.

Data Labeling 2.0 for Rich, Creative AI

Superintelligent AI, meet your human teachers. Our data labeling platform is designed from the ground up to train the next generation of AI — whether it’s systems that can code in Python, summarize poetry, or detect the subtleties of toxic speech. Use our powerful data labeling workforce and tools to build the rich, human-powered datasets you need today.