CodeClash

Since CodeClash's release, our top priority has been enabling practitioners to improve models as CodeClash competitors and ultimately, long-running, autonomous software developers.

As an initial step, we're releasing an initial set of 9 arenas that we're designating as the official "train" split of CodeClash (CC:Train).

Introducing 9 new training arenas for CodeClash!

CC:Train arenas span a range of properties, including:

Perfect (Chess) vs. Imperfect Information (Bridge)
Classical board games (Gomoku) vs. Custom competition formats (BattleCode)
Head-to-head (Chess, Gomoku, BattleCode) vs. Multi-player (Figgie, Gomoku, Halite)

Today's models are mainly trained with tasks that use unit tests as verification (e.g., SWE-bench, SWE-smith).

We are curious if coding capabilities could improve by post-training on open-ended, competitive objectives. Some ideas:

Self play RL with competition outcomes as rewards
Transferability.
- Training on Halite II/III may likely lead to better performance on Halite I, but what about Gomoku?
- Does training on open-ended code tasks (e.g., CodeClash, improving runtime [1, 2, 3]) improve performance on in/out-of-distribution coding benchmarks?
Mitigating code slop and bad development practices (e.g., single use scripts, redundant code, poor organization)

Introducing Training Arenas