←back to thread

2127 points bakugo | 2 comments | | HN request time: 0.423s | source
Show context
anotherpaulg ◴[] No.43164684[source]
Claude 3.7 Sonnet scored 60.4% on the aider polyglot leaderboard [0], WITHOUT USING THINKING.

Tied for 3rd place with o3-mini-high. Sonnet 3.7 has the highest non-thinking score, taking that title from Sonnet 3.5.

Aider 0.75.0 is out with support for 3.7 Sonnet [1].

Thinking support and thinking benchmark results coming soon.

[0] https://aider.chat/docs/leaderboards/

[1] https://aider.chat/HISTORY.html#aider-v0750

replies(18): >>43164827 #>>43165382 #>>43165504 #>>43165555 #>>43165786 #>>43166186 #>>43166253 #>>43166387 #>>43166478 #>>43166688 #>>43166754 #>>43166976 #>>43167970 #>>43170020 #>>43172076 #>>43173004 #>>43173088 #>>43176914 #
anotherpaulg ◴[] No.43166754[source]
Using up to 32k thinking tokens, Sonnet 3.7 set SOTA with a 64.9% score.

  65% Sonnet 3.7, 32k thinking
  64% R1+Sonnet 3.5
  62% o1 high
  60% Sonnet 3.7, no thinking
  60% o3-mini high
  57% R1
  52% Sonnet 3.5
replies(4): >>43167134 #>>43168719 #>>43168852 #>>43169016 #
vessenes ◴[] No.43169016[source]
Paul, I saw in the notes that using claude with thinking mode requires yml config updates -- any pointers here? I was parsing some commits, and I couldn't tell if you only added architect support through openrouter. Thanks!
replies(1): >>43172307 #
1. anotherpaulg ◴[] No.43172307[source]
Here are the current docs for changing the thinking token limits.

https://aider.chat/docs/llms/anthropic.html#thinking-tokens

I'll make this less clunky soon.

replies(1): >>43173340 #
2. vessenes ◴[] No.43173340[source]
Thanks. FWIW, it feels to me like this would be best as a global setting, not per-repo? Or, I guess it might be more aider-y to have sane defaults in the app and command line changes. Anyway, happily plugging away with the architect settings now!