How Surge AI Helped Cohere

Train Their Next-Gen Command LLM




Large Language Models


Reinforcement Learning with Human Feedback and Instruction Following

“Surge AI's LLM platform and managed service took the complexity out of human data labeling and annotation for training next-gen LLMs on custom human-written data.

Their team handled the end-to-end process of high-quality data collection, and provided best tips and expert advice – thereby freeing up our engineers to focus on core model development.

With Surge AI as a trusted partner, we rapidly accelerated time-to-market with our command-following models, recently achieving top results on the HELM benchmark.”
Alex Wang from Cohere
Alex Wang

Research Scientist

Cohere is a Leading Pioneer in Language AI
Cohere empowers every developer and enterprise to build incredible products with world-leading natural language processing (NLP) technology while keeping their data private and secure. Cohere enables businesses of all sizes to explore, generate, search for, and act upon information in a natural and intuitive manner, deploying across multiple cloud platforms in the data environment that works best for each customer.

The Problem

Building Trustworthy, High-Quality Human Infrastructure

Training large language models on high-quality human feedback data has been one of the key advances in LLM development over the past year.

However, building high-quality human feedback infrastructure is challenging – requiring a combination of technical, operational, and data expertise. On the technical side, Cohere needed easy-to-use labeling tools and robust quality control algorithms. On the operational side, they needed large-scale teams of workers with the sophistication needed to teach their models a diverse range of language-based skills. On the data side, they needed to develop guidelines to capture the diverse types of human feedback needed to make their language models effective at real-world use cases.

Cohere evaluated several data labeling vendors, but found them lacking in real-world large language model experience, and their quality didn’t meet Cohere’s bar.

After learning of Surge AI’s research and experience in the large language model space, Cohere began leveraging Surge AI’s LLM platform to train and evaluate their models on high-quality human data.

The Solution

Rich Human Feedback via Surge AI’s LLM Platform

Some of the key features Cohere leverages include:

  1. Proprietary quality control technology: Large language models are remarkably sensitive to the low-quality data typified by other data labeling companies — which often silently worsens models. Our advanced human and AI algorithms and technology were built by our team of scientists and researchers, who’ve worked on this problem for decades.
  2. Managed service: To perform human labeling, engineers and researchers often have to spend >50% of their time building frontend tools, recruiting labelers, training them, answering their questions, and monitoring their quality. Our managed service abstracts away these issues, so that Cohere’s team can focus on scaling their models instead of on these operational details.
  3. Large language model and human data expertise: Surge AI’s deep experience in collecting human data for training language models and generative AI ensures that Cohere gets the high-quality data they need off the bat – based on proven methods we’ve uncovered from hundreds of Surge-internal experiments.

Cohere's Command

Highly Powerful, Intelligent LLM Model that Understands Users’ Commands

After training on the human data that Surge AI provided, the Cohere team saw big lifts in their new Command model. The latest iteration of Cohere’s Command model ranks competitively in Stanford HELM rankings and the Cohere team continuously improves the model (on weekly basis) for their customers.

Human data is a crucial element that sets the newest generation of LLMs like ChatGPT apart from their predecessors. By gathering rich, high-quality human-written data at scale, Cohere is building models that take performance even further.

“Cohere’s Command Beta model gained the top spot in the Stanford HELM (Holistic Evaluation of Language Models) earlier this month. The startup’s generative model that’s conditioned to respond well to single-statement commands stood out among 36 LLM models, including Meta’s Galactica, OpenAI’s Davinci, Google’s Flan, Bloom and others.”

Want to learn more about Cohere’s Command Model and how it fares?
Read about it here
Want to learn more about human data and how to build your own state-of-the-art LLMs?

Meet the world's largest
RLHF platform