Claude 3.7 Sonnet and Claude Code

(www.anthropic.com)

2127 points bakugo | 1 comments | 24 Feb 25 18:28 UTC | HN request time: 0s | source

Show context

anotherpaulg ◴[24 Feb 25 20:40 UTC] No.43164684[source]▶

>>43163011 (OP) #

Claude 3.7 Sonnet scored 60.4% on the aider polyglot leaderboard [0], WITHOUT USING THINKING.

Tied for 3rd place with o3-mini-high. Sonnet 3.7 has the highest non-thinking score, taking that title from Sonnet 3.5.

Aider 0.75.0 is out with support for 3.7 Sonnet [1].

Thinking support and thinking benchmark results coming soon.

[0] https://aider.chat/docs/leaderboards/

[1] https://aider.chat/HISTORY.html#aider-v0750

replies(18): >>43164827 #>>43165382 #>>43165504 #>>43165555 #>>43165786 #>>43166186 #>>43166253 #>>43166387 #>>43166478 #>>43166688 #>>43166754 #>>43166976 #>>43167970 #>>43170020 #>>43172076 #>>43173004 #>>43173088 #>>43176914 #

anotherpaulg ◴[25 Feb 25 00:46 UTC] No.43166754[source]▶

>>43164684 #

Using up to 32k thinking tokens, Sonnet 3.7 set SOTA with a 64.9% score.

  65% Sonnet 3.7, 32k thinking
  64% R1+Sonnet 3.5
  62% o1 high
  60% Sonnet 3.7, no thinking
  60% o3-mini high
  57% R1
  52% Sonnet 3.5

replies(4): >>43167134 #>>43168719 #>>43168852 #>>43169016 #

mikae1 ◴[25 Feb 25 06:31 UTC] No.43168852[source]▶

>>43166754 #

It's clear that progress is incremental at this point. At the same time Anthropic and OpenAI are bleeding money.

It's unclear to me how they'll shift to making money while providing almost no enhanced value.

replies(1): >>43168989 #

khafra ◴[25 Feb 25 06:52 UTC] No.43168989[source]▶

>>43168852 #

Yudkowsky just mentioned that even if LLM progress stopped right here, right now, there are enough fundamental economic changes to provide us a really weird decade. Even with no moat, if the labs are in any way placed to capture a little of the value they've created, they could make high multiples of their investors' money.

replies(5): >>43169795 #>>43169803 #>>43170002 #>>43171064 #>>43175528 #

jonplackett ◴[25 Feb 25 09:18 UTC] No.43169795[source]▶

>>43168989 #

Yep totally agree. It will also depend who captures the most eyeballs.

ChatGPT is already my default first place to check something, where it was Google for the previous 20+ years.

replies(2): >>43171092 #>>43174752 #

sarchertech ◴[25 Feb 25 12:37 UTC] No.43171092[source]▶

>>43169795 #

Eyeballs aren’t enough though. Unlike Google ChatGPT is very expensive to run. It’s unlikely they can just slap ads on it like Google did.

replies(1): >>43172802 #

1. AJ007 ◴[25 Feb 25 15:06 UTC] No.43172802[source]▶

>>43171092 #

Inference costs will keep dropping. The stuff the average consumer does will be trivially cheap. More stuff will move on device. The edge capabilities of these models are already far beyond what the average person can use or comprehend.

The point I wonder about is the sustainability of every query being 30+ requests. Site owners aren't ready to have 98% of their requests be non-monetizable bot traffic. However, sites that have something to sell are..

↑