Dataset of Search Queries and Intents

Given the contorted ways in which people use search engines, and the long tail of query variety, understanding user intent behind search queries is difficult. In this dataset, we asked searchers to collect search queries and annotate them with their exact intent.

Dataset Preview

Built by an Elite Workforce

Surge AI is a data labeling platform and workforce. We built a special labeling team - Surgers trained on the nuances of search evaluation - to pore over thousands of search queries and URLs to craft this search query and intent dataset.

Other Datasets

InstructGPT-style Dataset
Build state-of-the-art large language models, in the style of InstructGPT and ChatGPT.
RLHF Dataset for Reinforcement Learning with Human Feedback
Build state-of-the-art AI by training your large language models on human feedback.
Japanese Hate Speech, Insults, and Toxicity Dataset
A dataset of online comments in Japanese that contain hate speech, insults, and toxicity.
Google Search Quality Dataset
This Google Search Quality dataset contains search queries, intents, result URLs, and a human-evaluated rating.
Search Evaluation Dataset
This search evaluation dataset contains search queries, the intent behind each search query, result URLs, and a human-evaluated search quality rating.
Twitter Sentiment Analysis Dataset
1000+ tweets, classified by sentiment.
Get notified

We're Launching More!

Thanks!
Oops! Something went wrong while submitting the form.

Love language?
So do we.

We're a team of engineers and researchers from Google, Facebook, Harvard, and MIT. We're building the modern data labeling infrastructure needed to power the next wave of AI.

Our data labeling platform and data labeling teams help AI companies around the world solve their core machine learning and language problems — from detecting hate speech and categorizing user reviews, to training powerful language models.

Our team comes from

Meet the world's largest
RLHF platform