←back to thread

555 points maheshrijal | 1 comments | | HN request time: 0.31s | source
Show context
georgewsinger ◴[] No.43707951[source]
Very impressive! But under arguably the most important benchmark -- SWE-bench verified for real-world coding tasks -- Claude 3.7 still remains the champion.[1]

Incredible how resilient Claude models have been for best-in-coding class.

[1] But by only about 1%, and inclusive of Claude's "custom scaffold" augmentation (which in practice I assume almost no one uses?). The new OpenAI models might still be effectively best in class now (or likely beating Claude with similar augmentation?).

replies(7): >>43708008 #>>43708068 #>>43708249 #>>43708545 #>>43709203 #>>43713202 #>>43716307 #
oofbaroomf ◴[] No.43708008[source]
Claude got 63.2% according to the swebench.com leaderboard (listed as "Tools + Claude 3.7 Sonnet (2025-02-24)).[0] OpenAI said they got 69.1% in their blog post.

[0] swebench.com/#verified

replies(3): >>43708246 #>>43708263 #>>43708363 #
1. awestroke ◴[] No.43708246[source]
OpenAI have not shown themselves to be trustworthy, I'd take their claims with a few solar masses of salt