←back to thread

555 points maheshrijal | 2 comments | | HN request time: 0.531s | source
Show context
brap ◴[] No.43707838[source]
Where's the comparison with Gemini 2.5 Pro?
replies(3): >>43707846 #>>43707897 #>>43708606 #
gallerdude ◴[] No.43707897[source]
For coding, I like the Aider polyglot benchmark, since it covers multiple programming languages.

Gemini 2.5 Pro got 72.9%

o3 high gets 81.3%, o4-mini high gets 68.9%

replies(4): >>43708090 #>>43708632 #>>43709557 #>>43709763 #
vessenes ◴[] No.43708632[source]
where do you find those o3 high numbers? https://aider.chat/docs/leaderboards/ currently has gemini 2.5 pro as the leader at, as you say, 72.9%.
replies(1): >>43708984 #
1. re-thc ◴[] No.43708984[source]
It's in the OpenAI article post (OP) i.e. OpenAI ran Aider themselves.
replies(1): >>43730783 #
2. vessenes ◴[] No.43730783[source]
Update: the leaderboard has o3 high + 4o tops of the charts now with 82.7%. This is a) amazing b) 20x more expensive than Gemini.