The Gamestop saga showed the power of social media sentiment — fueled by Discord memes and chatter on /r/WallStreetBets. So how do you know whether to buy the crypto dip or HODL?
We built a free dataset of Reddit crypto comments, labeled by sentiment, to help your machine learning models find out.
The sentiment analysis dataset
The dataset contains 1000 Reddit comments about crypto from Reddit, categorized according to Positive or Negative sentiment.
Examples
Here are some example Reddit comments from the dataset.
Positive Sentiment
Negative Sentiment
How we labeled it
Data Labeling Workforce
Labeling crypto sentiment can be surprisingly tricky. In order to do a good job, you need to understand the community and its jargon: what are diamond hands, tendies, and DCA? Does HODLing mean you’re a bull or a bear?
Unless you’re familiar with the crypto community, it’s difficult to label these! That’s why having data labelers with the right skills is essential to creating quality datasets.
For this project, we built a team of Surgers both interested in cryptocurrency and heavy Reddit users, who've worked on our other financial categorization and social media data labeling projects.
Interface
Here's a peek at our labeling UI. Our platform makes it fast to create new labeling jobs, whether through our API or our WYSIWYG editor.
More Surge AI Datasets
Want to build a custom financial dataset? Sign up and create a new labeling project in seconds, or reach out to us for help at hello@surgehq.ai!
Check out our other free datasets:
Data Labeling 2.0 for Rich, Creative AI
Superintelligent AI, meet your human teachers. Our data labeling platform is designed from the ground up to train the next generation of AI — whether it’s systems that can code in Python, summarize poetry, or detect the subtleties of toxic speech. Use our powerful data labeling workforce and tools to build the rich, human-powered datasets you need today.