←back to thread

2127 points bakugo | 1 comments | | HN request time: 0.215s | source
Show context
anotherpaulg ◴[] No.43164684[source]
Claude 3.7 Sonnet scored 60.4% on the aider polyglot leaderboard [0], WITHOUT USING THINKING.

Tied for 3rd place with o3-mini-high. Sonnet 3.7 has the highest non-thinking score, taking that title from Sonnet 3.5.

Aider 0.75.0 is out with support for 3.7 Sonnet [1].

Thinking support and thinking benchmark results coming soon.

[0] https://aider.chat/docs/leaderboards/

[1] https://aider.chat/HISTORY.html#aider-v0750

replies(18): >>43164827 #>>43165382 #>>43165504 #>>43165555 #>>43165786 #>>43166186 #>>43166253 #>>43166387 #>>43166478 #>>43166688 #>>43166754 #>>43166976 #>>43167970 #>>43170020 #>>43172076 #>>43173004 #>>43173088 #>>43176914 #
anotherpaulg ◴[] No.43166754[source]
Using up to 32k thinking tokens, Sonnet 3.7 set SOTA with a 64.9% score.

  65% Sonnet 3.7, 32k thinking
  64% R1+Sonnet 3.5
  62% o1 high
  60% Sonnet 3.7, no thinking
  60% o3-mini high
  57% R1
  52% Sonnet 3.5
replies(4): >>43167134 #>>43168719 #>>43168852 #>>43169016 #
mikae1 ◴[] No.43168852[source]
It's clear that progress is incremental at this point. At the same time Anthropic and OpenAI are bleeding money.

It's unclear to me how they'll shift to making money while providing almost no enhanced value.

replies(1): >>43168989 #
khafra ◴[] No.43168989[source]
Yudkowsky just mentioned that even if LLM progress stopped right here, right now, there are enough fundamental economic changes to provide us a really weird decade. Even with no moat, if the labs are in any way placed to capture a little of the value they've created, they could make high multiples of their investors' money.
replies(5): >>43169795 #>>43169803 #>>43170002 #>>43171064 #>>43175528 #
zeroq ◴[] No.43175528[source]
It's an echo chamber.

It is - what? - a fifth anniversary of "the world will be a completely different place in 6 months due to AI advancement"?

"Sam Altman believes AI will change the world" - of course he does, what else is he supposed to say?

replies(1): >>43176101 #
1. CamperBob2 ◴[] No.43176101[source]
It is a different place. You just haven't noticed yet.

At some point fairly recently, we passed the point at which things that took longer than anyone thought they would take are happening faster than anyone thought they would happen.