Surge AI | Human Infrastructure for NLP

Edwin Chen

Edwin is the founder and CEO of Surge AI.

How Anthropic uses Surge AI to Train and Evaluate Claude

Learn how Anthropic partnered with Surge AI to gather high-quality human feedback at scale using the RLHF platform, resulting in one of the safest and most advanced large language models on the planet.

HellaSwag or HellaBad? 36% of this popular LLM benchmark contains errors

Edwin Chen

We analyzed HellaSwag, a popular LLM benchmark, and found errors in 36% of its rows.

30% of Google's Emotions Dataset is Mislabeled

Edwin Chen

Last year, Google released their “GoEmotions” dataset: a human-labeled dataset of 58K Reddit comments categorized according to 27 emotions. The problem? A whopping 30% of the dataset is mislabeled! Check out some of the egregious errors, and learn how to build better datasets.30% of Google's Emotions Dataset is Mislabeled

How Surge AI Built OpenAI's GSM8K Dataset of 8,500 Math Problems

Edwin Chen

We built a dataset of 8,500 Grade School Math Problems for OpenAI. The goal of the dataset: to train language models like GPT-3 to solve natural language math problems and measure their reasoning ability. Learn about our process in this blog post!

We asked 100 humans to draw the DALL·E prompts

Edwin Chen

Where do human artists fit in a world of rich, creative AI? We asked 100 Surgers to draw the DALL-E prompts.

5 Examples of the Importance of Context-Sensitivity in Data-Centric AI

Edwin Chen

Data-centric AI requires radically rethinking the data that goes into your models. Surge AI provides data labelers with the skills you need to get context-sensitive labels.

The AI Bottleneck: High-Quality, Human-Powered Data

Edwin Chen

In theory, AI has blown past our wildest dreams; in practice, Siri can’t even tell us the weather. The problem? Creating high-quality datasets to train and measure our models is still incredibly difficult. We should be able to gather 20,000 labels for training a Reddit classifier in a single