How Anthropic uses Surge AI’s RLHF platform to train their LLM Assistant on Human Feedback

Edwin Chen
Mar 9, 2023
How Anthropic uses Surge AI’s RLHF platform to train their LLM Assistant on Human Feedback
“The team at Surge AI understands the unique challenges of training large language models and AI systems. Their human data labeling platform is tailored to provide the unique, high-quality feedback needed for cutting-edge AI work. Surge AI is an excellent partner to us in supporting our technical AI alignment research." — Jared Kaplan, Anthropic Co-Founder

Overview: Anthropic is One of the World’s Leading AI and LLM Companies

Anthropic is one of the world’s leading AI companies, building safe, state-of-the-art large language models and AI systems. Founded by a team of former OpenAI and Google Brain researchers, their AI Assistant, Claude, is already one of the safest, most capable LLMs on the planet – even surpassing OpenAI’s ChatGPT in a multitude of domains.

The Problem: Building Trustworthy, High-Quality Human Feedback

Researching and leveraging human data scaling laws – the power of human feedback for making AI systems more safe and useful – has been a key area for Anthropic from the start. Their research on Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback has been one of the most important advances in the field, exploring how to train a general language assistant to be helpful without providing harmful advice or exhibiting bad behaviors.

However, building the data pipeline needed to gather high-quality human feedback at scale was challenging. Finding people with the skills needed to annotate and label language model outputs was laborious, building robust quality control infrastructure was difficult, and developing labeling tools took time away from their core research expertise.

Anthropic evaluated several crowdsourcing and data labeling platforms, but found them lacking in large language model expertise, and it was difficult to extract the quality they needed.

After learning of Surge AI’s work with other key AI labs and large language model companies, Anthropic began leveraging the Surge AI LLM platform for their RLHF human feedback needs.

The Solution: Rich Human Feedback via Surge AI’s RLHF Platform

Anthropic has called Surge AI’s RLHF and human data a game changer for their research.

View the full tweet thread

Some of the key features Anthropic leverages include:

  1. Proprietary quality control technology. Large language models are remarkably sensitive to the low-quality data typified by other data labeling companies — which often sets their work back by years. Our advanced human/AI algorithms and technology were built by our team of scientists and researchers, who’ve worked on this problem for decades.
  1. Domain expert labelers. The Surge AI platform was designed for the next generation of AI. As language models become increasingly advanced, they need increasingly sophisticated human feedback to teach them — whether it’s learning to solve mathematical problems (see our collaborations with OpenAI), learning to code, or learning to converse like an expert in a variety of domains like law, medicine, business, and STEM subjects. Our domain expert labeling teams provide the breadth of deep skills Anthropic needs to teach LLMs the breadth of human language.
  1. Rapid experimentation interface. As researchers in a fast-evolving field, Anthropic’s scientists and engineers need to be able to design and launch new jobs quickly — without spending months writing long guidelines, or iterating 10 times in search of the quality they need. Our APIs and RLHF interfaces allow them to integrate their own tools and platforms for long-running jobs, while spinning up new jobs on the fly.
  1. Red teaming tools. In order to keep LLMs safe, Surgers red team Anthropic’s current safety defenses in order to uncover new holes to be patched. 
  1. RLHF and Language Model expertise. Instead of constant recalibration to discover how to make things work, our deep experience in RLHF and language models ensures that Anthropic gets the high-quality data they need every time – based on proven methods we’ve uncovered from hundreds of Surge-internal experiments.
View the original Tweet

Results: Anthropic’s Claude – a Safe, Highly Capable, State-of-the-Art AI Assistant

RLHF is widely recognized as the key ingredient that separates LLMs like ChatGPT from the previous generation. And with their own creative takes on leveraging the rich human feedback that Surge AI enables, Anthropic has taken this advance one step further.

View the original Tweet

With our partnership, Anthropic has been able to build one of the safest, most advanced LLMs on the planet. And as they continue to push the boundaries of human feedback in novel ways, Surge AI has been thrilled to partner with them on their journey.

Want to learn more about RLHF and how to build your own safe, helpful, state-of-the-art LLMs? Turn to the human feedback experts that the other experts trust. Reach out to our LLM team and get started today.

Edwin Chen

Edwin Chen

Edwin oversees Surge AI's Engineering and Research teams — whether it's helping customers train large language models on human feedback, building content moderation algorithms to detect hate speech and spam, or scaling up an elite data labeling workforce. He previously led AI, Data Science, and Human Computation teams at Google, Facebook, and Twitter, and studied mathematics and linguistics at MIT.

surge ai logo

Data Labeling 2.0 for Rich, Creative AI

Superintelligent AI, meet your human teachers. Our data labeling platform is designed from the ground up to train the next generation of AI — whether it’s systems that can code in Python, summarize poetry, or detect the subtleties of toxic speech. Use our powerful data labeling workforce and tools to build the rich, human-powered datasets you need today.

Meet the world's largest
RLHF platform

Follow Surge AI!