(openai.com)

555 points maheshrijal | 1 comments | 16 Apr 25 17:01 UTC | HN request time: 0s | source

Show context

georgewsinger ◴[16 Apr 25 17:20 UTC] No.43707951[source]▶

Very impressive! But under arguably the most important benchmark -- SWE-bench verified for real-world coding tasks -- Claude 3.7 still remains the champion.[1]

Incredible how resilient Claude models have been for best-in-coding class.

[1] But by only about 1%, and inclusive of Claude's "custom scaffold" augmentation (which in practice I assume almost no one uses?). The new OpenAI models might still be effectively best in class now (or likely beating Claude with similar augmentation?).

replies(7): >>43708008 #>>43708068 #>>43708249 #>>43708545 #>>43709203 #>>43713202 #>>43716307 #

1. knes ◴[17 Apr 25 04:42 UTC] No.43713202[source]▶

>>43707951 #

Right now the Swe-Bench leader Augment Agent still use Claude 3.7 in combo with o1. https://www.augmentcode.com/blog/1-open-source-agent-on-swe-...

The findings are open sourced on a repo too https://github.com/augmentcode/augment-swebench-agent

↑

OpenAI o3 and o4-mini