Humans & AI [Ep. 2] - Introducing CC:Ladder

Where does AI rank among public solutions by human programmers?

Jan. 15, 2026 • by John Yang


tl;dr We introduce boss battles as a new format for evaluating LMs' coding + reasoning capabilities.

We pit Claude 4.5 Sonnet against GigaChad in RobotRumble and found that today's best coding models still struggle heavily to develop suboptimal codebases into ones that rival the best human written solutions.

Inspired by this finding, we introduce CC:Ladder, a twist that makes evaluating LMs as competitive, long-horizon software developers hill-climable and cheaper.

How it works

In CC:Ladder, models begin against the weakest human solution and must win a majority of n rounds to advance to increasingly stronger opponents; evaluation is determined by the highest-ranked opponent defeated.

Some key details:

CC:Ladder has several advantages over the default Elo leaderboard.

Building CC:Ladder

Putting together a ladder for a CodeClash arena is entirely dependent on how many open source, human written solutions are available on the web.

Given a solution, we (1) check that the solution compiles and runs properly, then (2) push the solution as a branch (named human/<name> or human/<author>/<name>) to the corresponding repository (branches for Core War, RobotRumble).

We currently execute this workflow manually. Ping us in Slack if you'd be interested in automating this process or putting together a new ladder for a different arena!

Initial Findings

Part 1: Ranking human-written solutions

Given n solutions, we make every unique pair of solutions compete t times.

t varies solely due to compute constraints. Core War simulations run more quickly than RobotRumble simulations.

Then, we compute each solution's Elo and determine the rankings. Elo ratings are computed by fitting a Bradley-Terry model to the pairwise win matrix via maximum likelihood estimation with L2 regularization. We set the regularization strength to 0.01 and use a base Elo of 1200 with a slope of 400 to convert log-odds strengths to interpretable ratings.

For Core War, the top ten:

  1. human/toxic: 1408.7
  2. human/forjohn: 1401.9
  3. human/maelstrom: 1396.0
  4. human/silkworm: 1392.2
  5. human/returnofthefugitive: 1386.1
  6. human/unheardof: 1385.3
  7. human/devilstick: 1384.7
  8. human/mascafe: 1379.6
  9. human/cloudburst: 1376.9
  10. human/decoysignal: 1372.2
Show full Core War rankings
  1. human/toxic: 1408.7
  2. human/forjohn: 1401.9
  3. human/maelstrom: 1396.0
  4. human/silkworm: 1392.2
  5. human/returnofthefugitive: 1386.1
  6. human/unheardof: 1385.3
  7. human/devilstick: 1384.7
  8. human/mascafe: 1379.6
  9. human/cloudburst: 1376.9
  10. human/decoysignal: 1372.2
  11. human/chainlockv02a: 1370.0
  12. human/burningmetal: 1367.7
  13. human/defensive: 1365.0
  14. human/firestorm: 1364.8
  15. human/dawn2: 1362.2
  16. human/mercenary: 1361.5
  17. human/pdqscan: 1358.1
  18. human/lastjudgement: 1351.7
  19. human/rust: 1350.8
  20. human/snowscan: 1350.6
  21. human/frothandfizzle: 1346.6
  22. human/thefugitive: 1346.3
  23. human/blackknight: 1342.6
  24. human/sonofvain: 1340.3
  25. human/dawn: 1339.8
  26. human/goldeneye: 1335.4
  27. human/silking: 1332.1
  28. human/artofcorewar: 1331.9
  29. human/blowrag: 1329.2
  30. human/returnofthejedimp: 1326.9
  31. human/danceoffallenangels: 1324.6
  32. human/azathoth: 1320.9
  33. human/kosmos: 1319.4
  34. human/simplicity: 1314.0
  35. human/armadillo: 1313.3
  36. human/combatra: 1313.2
  37. human/cinammon: 1309.9
  38. human/returnofthependragon: 1306.9
  39. human/numb: 1305.0
  40. human/neith: 1304.3
  41. human/halcyon: 1303.2
  42. human/olivia: 1303.2
  43. human/reepicheep: 1301.3
  44. human/hullab3loo: 1301.0
  45. human/npaperii: 1300.7
  46. human/elvenking: 1298.3
  47. human/gargantuan: 1297.8
  48. human/mandragora: 1296.4
  49. human/safetyinnumbers: 1295.4
  50. human/hullabaloo: 1290.9
  51. human/eccentric: 1290.0
  52. human/thunderstrike: 1289.6
  53. human/impishv02: 1289.2
  54. human/ziggy: 1289.0
  55. human/stylizedeuphoria: 1288.7
  56. human/ironicimps: 1287.6
  57. human/gigolo: 1286.8
  58. human/gremlin: 1285.1
  59. human/borgir: 1283.6
  60. human/unrequitedlove: 1279.4
  61. human/themystery: 1278.0
  62. human/spiritualblackdimension: 1276.2
  63. human/recycledbits: 1273.1
  64. human/jade: 1272.7
  65. human/luca: 1268.9
  66. human/vain: 1268.8
  67. human/bitethebullet: 1268.3
  68. human/disharmonious: 1267.6
  69. human/uninvited: 1267.6
  70. human/revengeofthepapers: 1267.4
  71. human/bulldozed: 1265.7
  72. human/diehard: 1264.2
  73. human/nighttrain: 1263.0
  74. human/blacken: 1262.7
  75. human/sunset: 1261.6
  76. human/devilish202: 1261.4
  77. human/retroq: 1259.8
  78. human/evolcap66: 1259.3
  79. human/fixed: 1258.7
  80. human/nemesis: 1258.5
  81. human/ompega: 1258.2
  82. human/stormkeeper: 1256.1
  83. human/quicksilver: 1255.7
  84. human/slimetest: 1255.3
  85. human/rosebud: 1255.2
  86. human/bluecandle: 1253.0
  87. human/riseofthedragon: 1252.6
  88. human/kryptonite: 1250.0
  89. human/digitalis2003: 1245.4
  90. human/freighttrain: 1245.4
  91. human/electricrazor: 1244.8
  92. human/forgottenlore2: 1244.3
  93. human/timescape10: 1243.4
  94. human/revivalfire: 1240.3
  95. human/hellfire: 1239.7
  96. human/nightterrors: 1238.1
  97. human/thehistorian: 1236.9
  98. human/borg: 1236.7
  99. human/falconv03: 1236.2
  100. human/torment: 1234.1
  101. human/impfinityv4g1: 1232.7
  102. human/behemot: 1230.5
  103. human/returnofvanquisher: 1229.9
  104. human/forgottenlore: 1228.4
  105. human/sputnik: 1228.3
  106. human/unpitq: 1227.8
  107. human/vanquisher: 1227.7
  108. human/blade: 1227.2
  109. human/arrow: 1225.5
  110. human/electrichead: 1225.2
  111. human/lithobolia: 1224.1
  112. human/enigma: 1223.8
  113. human/valkyrie: 1223.5
  114. human/hazylazy: 1223.3
  115. human/shottonothing: 1222.1
  116. human/bigitalshot: 1221.9
  117. human/hazylazyc11: 1221.5
  118. human/alladinscave: 1220.8
  119. human/dust07: 1220.6
  120. human/unpit: 1219.5
  121. human/herbalavenger: 1219.3
  122. human/grendelsrevenge: 1218.8
  123. human/fireandice: 1218.5
  124. human/whitemist: 1218.3
  125. human/macromagic: 1218.0
  126. human/xenosmilus: 1217.3
  127. human/hector2: 1215.3
  128. human/oblivion: 1214.1
  129. human/bpanamax: 1213.9
  130. human/carmilla: 1213.4
  131. human/excalibur: 1213.3
  132. human/simple88v2: 1212.9
  133. human/kusanagi: 1212.8
  134. human/perseus: 1211.7
  135. human/barrage: 1211.1
  136. human/jackinthebox: 1210.4
  137. human/discord: 1209.7
  138. human/boysarebackintown: 1208.8
  139. human/nosferatu: 1208.1
  140. human/pendulum: 1207.4
  141. human/jinx: 1207.0
  142. human/vampsareback02: 1205.1
  143. human/zooom: 1204.8
  144. human/sprawlingchaos: 1204.7
  145. human/eternalexile: 1204.5
  146. human/bloodlust: 1204.1
  147. human/curseoftheundead: 1203.9
  148. human/recon2: 1201.0
  149. human/jackintheboxii: 1200.5
  150. human/blizzard: 1199.8
  151. human/hazyshadeii: 1199.0
  152. human/sneakyb2: 1198.8
  153. human/labomba: 1198.8
  154. human/bluefunk3: 1198.3
  155. human/lithium: 1197.8
  156. human/damageincorporated: 1197.6
  157. human/torcht18: 1197.0
  158. human/probe: 1196.3
  159. human/intotheunknown: 1195.6
  160. human/grilledoctopus05: 1194.4
  161. human/yogibear: 1193.5
  162. human/infiltrator: 1193.1
  163. human/myvamp54: 1192.5
  164. human/claw: 1192.4
  165. human/stoninc: 1192.2
  166. human/chameleon: 1191.7
  167. human/thenextstep88: 1191.3
  168. human/julietandpaper: 1190.4
  169. human/stalker: 1189.8
  170. human/zygote: 1189.7
  171. human/tnt: 1189.1
  172. human/bayonet: 1188.4
  173. human/mason20: 1185.1
  174. human/tornado30: 1184.8
  175. human/bluefunk: 1184.6
  176. human/myvamp37: 1184.3
  177. human/onebite: 1183.8
  178. human/icedragon: 1182.6
  179. human/win: 1181.2
  180. human/soldieroffortune: 1179.0
  181. human/mirage15: 1178.8
  182. human/mirage2: 1178.7
  183. human/nightofthelivingdead: 1178.7
  184. human/flurry: 1177.2
  185. human/blur2: 1176.4
  186. human/blur: 1175.3
  187. human/thermiteii: 1175.2
  188. human/gemoftheocean: 1173.9
  189. human/replicant: 1172.5
  190. human/vamp02b: 1171.2
  191. human/aeka: 1170.6
  192. human/quiz: 1167.8
  193. human/gothik: 1164.0
  194. human/evoltmp88: 1162.1
  195. human/twister: 1161.1
  196. human/agonyii: 1158.8
  197. human/steppingstone: 1157.2
  198. human/abomination: 1155.6
  199. human/phq: 1155.3
  200. human/beholderseye17: 1150.3
  201. human/armorya5: 1149.9
  202. human/foggyswamp: 1149.9
  203. human/elementaldust2: 1149.5
  204. human/heremscimitar: 1149.2
  205. human/pacman: 1148.8
  206. human/leviathan: 1146.3
  207. human/chimerav35: 1146.0
  208. human/leapfrog: 1144.4
  209. human/snake: 1143.9
  210. human/irongate: 1141.6
  211. human/fatexpansionv: 1138.7
  212. human/seventyfive: 1137.6
  213. human/kitchensinkii: 1136.9
  214. human/cannonade: 1133.5
  215. human/lucky3: 1133.3
  216. human/winterwerewolf3: 1133.0
  217. human/blur88: 1132.1
  218. human/leprechaunonspeed: 1130.5
  219. human/stasis: 1130.1
  220. human/agony51: 1128.4
  221. human/ttti: 1127.0
  222. human/thermite10: 1124.5
  223. human/capskeyisstuck: 1124.2
  224. human/sj4a: 1123.4
  225. human/medusasv7x: 1122.7
  226. human/ncdecoy: 1122.2
  227. human/agony31: 1122.2
  228. human/hordesofmicrowarriors: 1121.1
  229. human/sphinxv28: 1118.6
  230. human/rave: 1115.5
  231. human/keystonet13: 1113.6
  232. human/charonv81: 1113.2
  233. human/leprechaun1b: 1106.0
  234. human/nomuckingabout: 1096.6
  235. human/charonv70: 1095.4
  236. human/bscannersliveinvain: 1094.9
  237. human/crimp2: 1092.1
  238. human/crimp: 1090.7
  239. human/killerinstinct: 1088.4
  240. human/imprimis6: 1084.4
  241. human/griffin2: 1083.7
  242. human/requestv20: 1076.7
  243. human/impurge: 1067.2
  244. human/backstabber: 1066.2
  245. human/0stormbringer: 1065.0
  246. human/twilightpitsv60: 1060.2
  247. human/fastfoodv21: 1056.8
  248. human/flashpaper: 1046.7
  249. human/flashpaper37: 1045.9
  250. human/gammapaper30: 1045.4
  251. human/flypaper30: 1040.7
  252. human/hydra: 1026.4
  253. human/precipice: 1025.0
  254. human/trinity: 1022.7
  255. human/paratroopsv21: 1017.9
  256. human/genocide: 1015.6
  257. human/vagabond: 1001.0
  258. human/notepaper: 967.6
  259. human/returnofthelivingdead: 955.5
  260. human/smoothnoodlemap6: 909.9
  261. human/smoothnoodlemap: 887.8
  262. human/dwarf: 864.3
  263. human/validate: 344.1
  264. human/pspace: -889.5

For RobotRumble, the top ten:

  1. human/entropicdrifter/gigachad: 3219.0
  2. human/entropicdrifter/seven-of-nine: 2627.3
  3. human/entropicdrifter/we-are-borg: 2560.0
  4. human/entropicdrifter/glommerv2: 2456.8
  5. human/mousetail/coward-bot: 2326.5
  6. human/entropicdrifter/glommer: 2250.2
  7. human/mitch84/crw_preempt: 2109.9
  8. human/mitch84/retreat_walk2: 2040.6
  9. human/devchris/black_magic: 2001.7
  10. human/tabaxi3k/black-magic-1: 1994.3
Show full RobotRumble rankings
  1. human/entropicdrifter/gigachad: 3219.0
  2. human/entropicdrifter/seven-of-nine: 2627.3
  3. human/entropicdrifter/we-are-borg: 2560.0
  4. human/entropicdrifter/glommerv2: 2456.8
  5. human/mousetail/coward-bot: 2326.5
  6. human/entropicdrifter/glommer: 2250.2
  7. human/mitch84/crw_preempt: 2109.9
  8. human/mitch84/retreat_walk2: 2040.6
  9. human/devchris/black_magic: 2001.7
  10. human/tabaxi3k/black-magic-1: 1994.3
  11. human/mitch84/walk_retreat: 1968.8
  12. human/jammyliu/sixty-nine-line: 1889.7
  13. human/atl15/centerrr: 1838.2
  14. human/clay/diag-lattice: 1719.0
  15. human/gerenuk/gere-ape: 1712.4
  16. human/wolfsleuth/simple: 1656.1
  17. human/essickmango/pickle-up: 1655.9
  18. human/mkap/test: 1638.9
  19. human/ketza/arthur: 1624.4
  20. human/mountain/neuralbot4-3h: 1622.5
  21. human/aaoutkine/silo34: 1618.6
  22. human/anton/om-om: 1594.2
  23. human/mee42/follow-bot: 1594.1
  24. human/lanity/sivuy: 1593.7
  25. human/underscore/bot1: 1589.8
  26. human/mario31313/alpha_13: 1588.9
  27. human/thesmilingturtl/naivefaa: 1587.8
  28. human/aaoutkine/school-bot: 1570.6
  29. human/suddenlyseals/control-center: 1551.4
  30. human/ketza/bob: 1543.2
  31. human/mjburgess/rule99: 1499.7
  32. human/kalkin/maxad: 1498.1
  33. human/mousetail/genetic-robot: 1493.7
  34. human/edward/flail: 1477.2
  35. human/aayyad/testbot: 1427.0
  36. human/anton/anton4000: 1397.8
  37. human/luisa/baselinegere: 1226.0
  38. human/luisa/luisasrobot: 1223.1
  39. human/jay0jayjay/naivestarter: 1168.3
  40. human/aaa/jippty5: 1032.3
  41. human/devchris/first_test: 940.9
  42. human/tabaxi3k/charles: 936.3
  43. human/essickmango/fruity-test: 935.9
  44. human/sbasu3/meek-bot: 499.4
  45. human/jiricodes/jiricodes-bot: 400.0
  46. human/navster8/maginot-line: 397.3
  47. human/kalkin/artemis2: 390.0
  48. human/kalkin/artemis: 340.7
  49. human/mountain/neuralbot2-6h: 331.4
  50. human/sivecano/clouded-mind: 75.9
  51. human/mountain/neuralbot1-1h: 23.5
  52. human/aaoutkine/dark-knight: -55.6
  53. human/navster8/bash-brothers: -496.0
  54. human/ldang/nemo: -496.7
  55. human/ldang/nessy: -538.5
  56. human/anton/wallifier: -911.3
  57. human/happysquid/test: -1624.4
  58. human/anton/anton3000: -1736.7

Part 2: How high do current models climb?

On Core War

On RobotRumble

How to run?

Run your model against CC:Ladder today. Set up CodeClash and run uv run python ladder.py configs/ladder/<arena>.yaml, where <arena>.yaml specifies (using Core War as the example arena):

tournament:
  rounds: 5 # Number of rounds model players each opponent
game:
  name: CoreWar
  sims_per_round: 1000
  args: {}
player:
  agent: mini
  name: claude-sonnet-4-5-20250929
  config:
    agent: !include mini/default.yaml
    model:
      model_name: '@anthropic/claude-sonnet-4-5-20250929'
      model_kwargs:
        temperature: 0.2
        max_tokens: 4096

Relationship between CC:Ladder & CodeClash

For Pokémon fans, CC:Ladder is the equivalent of the Elite 4 battles (and for the real aficionados, CC:Ladder is inspired heavily by the Trainer Tower). CodeClash is the real world Video Game Championships, where individuals compete against other humans (not a static bot).

As with the Elite Four, CC:Ladder tests progression against fixed opponents, whereas CodeClash reflects real competition by measuring performance against intelligent competitors.

We recommend CC:Ladder be treated as a proper evaluation as well. Similar to how SWE-bench Lite and Verified were created as easier subsets of SWE-bench, we think

CodeClash remains the north-star evaluation. Competition against dynamic, intelligent competition is more challenging than static solutions. However, given the rather dismal current state of models' ability to code against smart rivals across a long horizon, we introduce CC:Ladder as a stepping stone towards such capabilities.