(openai.com)

555 points maheshrijal | 2 comments | 16 Apr 25 17:01 UTC | HN request time: 0.531s | source

Show context

brap ◴[16 Apr 25 17:11 UTC] No.43707838[source]▶

Where's the comparison with Gemini 2.5 Pro?

gallerdude ◴[16 Apr 25 17:16 UTC] No.43707897[source]▶

For coding, I like the Aider polyglot benchmark, since it covers multiple programming languages.

Gemini 2.5 Pro got 72.9%

o3 high gets 81.3%, o4-mini high gets 68.9%

vessenes ◴[16 Apr 25 18:16 UTC] No.43708632[source]▶

where do you find those o3 high numbers? https://aider.chat/docs/leaderboards/ currently has gemini 2.5 pro as the leader at, as you say, 72.9%.

replies(1): >>43708984 #

1. re-thc ◴[16 Apr 25 18:49 UTC] No.43708984[source]▶

It's in the OpenAI article post (OP) i.e. OpenAI ran Aider themselves.

replies(1): >>43730783 #

2. vessenes ◴[18 Apr 25 18:43 UTC] No.43730783[source]▶

Update: the leaderboard has o3 high + 4o tops of the charts now with 82.7%. This is a) amazing b) 20x more expensive than Gemini.

OpenAI o3 and o4-mini