←back to thread

555 points maheshrijal | 1 comments | | HN request time: 0.209s | source
Show context
brap ◴[] No.43707838[source]
Where's the comparison with Gemini 2.5 Pro?
replies(3): >>43707846 #>>43707897 #>>43708606 #
gallerdude ◴[] No.43707897[source]
For coding, I like the Aider polyglot benchmark, since it covers multiple programming languages.

Gemini 2.5 Pro got 72.9%

o3 high gets 81.3%, o4-mini high gets 68.9%

replies(4): >>43708090 #>>43708632 #>>43709557 #>>43709763 #
1. croemer ◴[] No.43709557[source]
Isn't it easy to train on the specific Exercism exercises that this benchmark uses?