Google Search Quality Dataset

Measuring and training search engines is hard — especially when user-engagement signals like clicks and dwell time are imperfect measures. This is why modern, state-of-the-art search engines like Google, Bing, and Neeva use human evaluation to train and measure search quality. Explore this dataset of Google Search Quality and start training better Search Ranking algorithms today.

Download Dataset

Can you build a better Google?

Thanks! We'll send the data via email.
Oops! Something went wrong while submitting the form. Try again.

Dataset Preview

Built by an Elite Workforce

Surge AI is a data labeling platform and workforce. We built a special labeling team of search evaluators - Surgers trained on the nuances of human evaluation - to pore over thousands of search queries and URLs to craft this search evaluation dataset.

Other Datasets

Japanese Hate Speech, Insults, and Toxicity Dataset
A dataset of online comments in Japanese that contain hate speech, insults, and toxicity.
Dataset of Search Queries and Intents
This dataset contains search queries, as well as the user's intent when performing the search query.
Search Evaluation Dataset
This search evaluation dataset contains search queries, the intent behind each search query, result URLs, and a human-evaluated search quality rating.
Twitter Sentiment Analysis Dataset
1000+ tweets, classified by sentiment.
Email Spam Dataset
A dataset of real Spam and Not Spam emails, including whether or not they were caught by Gmail's spam filters.
Fake News Dataset
A dataset of social media posts containing fake news.
Get notified

We're Launching More!

Thanks!
Oops! Something went wrong while submitting the form.

Love language?
So do we.

We're a team of engineers and researchers from Google, Facebook, Harvard, and MIT. We're building the modern data labeling infrastructure needed to power the next wave of AI.

Our data labeling platform and data labeling teams help AI companies around the world solve their core machine learning and language problems — from detecting hate speech and categorizing user reviews, to training powerful language models.

Our team comes from

Data Labeling for the
Richness of AI

Build human-powered datasets using our global labeling workforce and platform.