Surge AI | Research Engineer, Coding Evaluation & Training Data

Research Engineer, Coding Evaluation & Training Data

About Us

Our mission is to raise AGI with the richness of human intelligence — curious, witty, imaginative, and full of unexpected brilliance.

Surge was founded by engineers and researchers who dreamed of building the next generation AI. We're building a platform that powers the most powerful models in the world in partnership with companies like OpenAI, Anthropic, Meta, and Google.

At Surge, we believe the path to AGI isn't just about scaling compute—it's about embracing the unlimited ceiling of human intelligence and creativity in the data that shapes these systems. Our platform combines elite human expertise with cutting-edge tools for scalable oversight, from building rich RL environments to conducting rigorous evaluations that go beyond benchmarks. We've run a profitable business from day one without raising venture funding.

The Role

As a Research Engineer, Coding Evaluation & Training Data at Surge, you'll sit at the intersection of software engineering and product to build and run the systems that teach frontier models how to code.

You’ll own end-to-end coding data projects for top AI labs, designing tasks, RL environments, and evaluation schemes that reflect real-world software engineering. This is an ideal role for a software engineer who wants to keep using their engineering skill set every day, but is most excited by training data quality, agentic evaluation, and system design (humans + models + tools) rather than traditional product feature work.

What You'll Do

Own end-to-end coding data projects, from initial scoping and pilot design through execution, iteration, and scale-up.
Design agentic training workflows and task structures that mirror real-world SWE work (e.g., refactoring, debugging, code review, large-repo navigation, tool use).
Define and iterate on rubrics, golden sets, and reward signals that capture true engineering value, not just “does it compile.”
Evaluate data and worker output with strong SWE taste, and make calls about what meets the bar for frontier training.
Design and run qualification processes for coding workers, including hands-on assessments of their coding ability.
Set up or partner on complex technical environments (e.g., containers, repos, test harnesses, sandboxes, code execution infrastructure).
Partner with technical staff at our clients to translate high-level training goals into concrete projects and technical environments.
Collaborate closely with Surge engineering, product, and operations to improve our coding data products, internal tools, and execution processes.

What We're Looking For

3–6+ years of professional software engineering experience building and maintaining real systems.
Strong coding ability in at least one mainstream language and comfort working in production codebases.
High “taste” for good engineering: you care about correctness, code quality, and how real engineering teams actually work.
Ability to reason about and debug technical environments, including containers, dependencies, and automated test setups.
Interest in owning projects end-to-end, including scoping, workflow design, execution, and continuous improvement.
Excellent written and verbal communication skills, with the ability to speak credibly with senior client engineers and translate fuzzy goals into actionable plans.
Excitement about AI/ML systems and the role of data, evaluation, and reward design in improving agentic coding capabilities.

How to Apply

We created a short exercise to help us get to know you better. Once you’ve completed it, please email your submission and resume to careers@surgehq.ai and include the role you’re applying for in the subject line.

Help us raise AI for
the real world

Apply now