Introducing CodeClash

Benchmarking Goal Oriented Software Engineering

Nov. 3, 2025 • by John Yang, Kilian Lieret


Existing coding benchmarks evaluate Language Models (LMs) on tasks.

Implement a function, fix a bug, write a test.

We tell models what to do, they give it a shot, and we evaluate correctness with unit tests.

This approach has driven impressive progress in LMs' code generation capabilities over the past few years.

However, as LM scores have skyrocketed on evaluations like HumanEval and SWE-bench, such improvement also beckons the question: Is the future of code evals just making harder tasks?

Our answer is founded in a simple question: Why do we write code?

To achieve goals!

Software developers aren't just incessantly solving tickets with no aim. We code to improve user retention, increase revenue, reduce costs, achieve higher customer satisfaction - the list is endless.

Towards these goals, we decompose objectives into steps, prioritize them, and must strategically decide which solutions to pursue.

And it's a continuous, often competitive loop. Propose changes, deploy them, analyze real-world feedback (e.g., metrics, user behavior, A/B test results), then do it all again. From this perspective, tasks are but small, isolated pieces tied together by an overarching goal.

So we posit - perhaps the next frontier in code evaluation is not harder tasks, but goal-oriented software engineering.

To formalize this, we're excited to share CodeClash!

Multiple LM systems compete to build the best codebase for achieving a high-level objective over the course of a multi-round tournament. These codebases implement solutions that compete in a code arena.

Picture Credit to Abe Hou

Crucially, LMs do not play directly. Instead, they iteratively refine code that competes as their proxy.

CodeClash enables us to examine models as long-running, continually improving developers:

If you're curious about models using code as the modality to learn, adapt, and improve over time, CodeClash is the playground for you.

Thanks for reading! Check out our paper for the full story. And if you're ready to dive in, here's a quick video to show you how to set up the repository and run your first CodeClash tournament!