Getting started with the Python SDK

If you need programmatic access to the Surge platform, look no further! You can use the Surge API to manage all aspects of your labeling project. Our open-source Python SDK provides convenient access to our API, with functions for creating new projects, running tasks, and much more.

In this blog post, we'll walk through the setup of a sample project from start to finish using our Python SDK.

1. Installation

If you haven't already, sign up for an account on Surge. Once you are logged in, go to your Profile page and copy your API key. This will be used to authenticate your API requests.

Next, use pip to install our Python package:

<code>pip install --upgrade surge-api<code>

2. Authentication

Your API key carries many privileges, so be sure to keep it secure! Do not share it in publicly accessible areas such as GitHub, client-side code, and so forth.

To use your API key, assign it to <code>surge.api_key<code>. The Python SDK will then automatically send this key to authenticate each request.

Alternatively, you can set your API key as an environment variable.

<code>export SURGE_API_KEY=YOUR-API-KEY<code>

3. Creating a project

Now, it's time to create a new labeling project. Let's make one to gather information about movie reviews.

Each project should have a list of questions to be completed by the labeling workforce. Surge supports a wide range of questions including multiple choice, free response, text tagging, bounding boxes, etc.

To create new questions, instantiate the desired <code>Question<code> objects. In this project, we will ask labelers to tag named entities in the movie review (text tagging), determine its sentiment (multiple choice), and write their very own review (free response).

Notice that the text associated with the text tagging question is <code>{{review}}<code>, which is a handlebars expression. The actual movie review will automatically replace this placeholder once we import data, which we'll do in the upcoming task creation step.

Once the questions have been created, simply pass them into the method <code>surge.Project.create()<code>:

Voila! You now have a new project!

4. Creating tasks

The next step is to add data that you want to label. After you upload a dataset, each data point becomes a task that can be sent to a worker for labeling.

One way to create tasks is by formatting your data as a list of dictionaries. You then add the list to your current project by using the method <code>project.create_tasks()<code>:

In this case, we created two labeling tasks for the workforce: one for Parasite and another for Joker. The workers would answer the questions we made previously using this reviews data.

You can also create tasks in bulk by uploading a local CSV file. The header of the CSV file must specify the fields that are used in your tasks:

Whenever a task is created, the project is automatically launched if it is not already in progress.

Now that all of the data for tasks has been added, you can see how a task looks by clicking Preview task for the project on the Surge platform. It should look like this:

5. Creating gold standards

Gold standards are used to assess the quality of each worker in the labeling workforce. After a gold standard answer is created, each worker would be required to complete it.

On the Surge platform, the analytics pages show how well workers performed on your gold standards, as well as other quality control metrics.

To set gold standards using the Python SDK, simply pass in a list of the correct answers to each question:

Wrapping up

In this post, we successfully launched a labeling project on Surge using our Python SDK.

There are plenty of other features in the API that weren't mentioned here, which you can learn more about in our official API documentation.

You can also check out more details on our open-source GitHub repo. If you have any feature requests or bug reports, feel free to file an issue. We're more than happy to hear your feedback to help drive our product roadmap!

Labeling in action: an example of a text tagging (NER) question on the Surge platform.

—

Disappointed in your MTurk results? Surge AI delivers better data, faster. Book a quick intro call with our team today!

‍

Andrew Mauboussin

Andrew oversees Surge AI's Engineering and Machine Learning teams. He previously led Twitter's Spam and Integrity efforts, and studied Computer Science at Harvard.

Data Labeling 2.0 for Rich, Creative AI

Superintelligent AI, meet your human teachers. Our data labeling platform is designed from the ground up to train the next generation of AI — whether it’s systems that can code in Python, summarize poetry, or detect the subtleties of toxic speech. Use our powerful data labeling workforce and tools to build the rich, human-powered datasets you need today.

Getting started with Surge AI's Python SDK

1. Installation

2. Authentication

3. Creating a project

4. Creating tasks

5. Creating gold standards

Wrapping up

The average number of ads on a Google Search recipe? 8.7

Andrew Mauboussin

Data Labeling 2.0 for Rich, Creative AI

Meet the world's largest
RLHF platform

Welcome to
the world's largest RLHF platform

Getting started with Surge AI's Python SDK

1. Installation

2. Authentication

3. Creating a project

4. Creating tasks

5. Creating gold standards

Wrapping up

The average number of ads on a Google Search recipe? 8.7

Andrew Mauboussin

Data Labeling 2.0 for Rich, Creative AI

Related articles

Data that Speaks for Itself

How Surge AI Built OpenAI's GSM8K Dataset of 8,500 Math Problems

How Anthropic uses Surge AI’s RLHF platform to train their LLM Assistant on Human Feedback

The average number of ads on a Google Search recipe? 8.7

Google Search is Falling Behind

Building a No-Code Machine Learning Model by Chatting with GitHub Copilot

Meet the world's largest RLHF platform

Welcome to the world's largest RLHF platform

Meet the world's largest
RLHF platform

Welcome to
the world's largest RLHF platform