←back to thread

2127 points bakugo | 1 comments | | HN request time: 0.316s | source
Show context
anotherpaulg ◴[] No.43164684[source]
Claude 3.7 Sonnet scored 60.4% on the aider polyglot leaderboard [0], WITHOUT USING THINKING.

Tied for 3rd place with o3-mini-high. Sonnet 3.7 has the highest non-thinking score, taking that title from Sonnet 3.5.

Aider 0.75.0 is out with support for 3.7 Sonnet [1].

Thinking support and thinking benchmark results coming soon.

[0] https://aider.chat/docs/leaderboards/

[1] https://aider.chat/HISTORY.html#aider-v0750

replies(18): >>43164827 #>>43165382 #>>43165504 #>>43165555 #>>43165786 #>>43166186 #>>43166253 #>>43166387 #>>43166478 #>>43166688 #>>43166754 #>>43166976 #>>43167970 #>>43170020 #>>43172076 #>>43173004 #>>43173088 #>>43176914 #
gwd ◴[] No.43165555[source]
Interesting that the "correct diff format" score went from 99.6% with Claude 3.5 to 93.3% for Claude 3.7. My experience with using claude-code was that it consistently required several tries to get the right diff. Hopefully all that will improve as they get things ironed out.
replies(3): >>43166482 #>>43166647 #>>43168693 #
Sterling9x ◴[] No.43168693[source]
That's a file context problem because you use cursor or cline or some other crap context maker. Try Clood.

Unless "anthropic high usage" which I just watch the incident reports I one shot features regularly.

At a high skill level. Not front end. Back end c# in a small but great framework that has poor documentation. Not just endpoints but full on task queues.

So really, it's a context problem. You're just not laser focusing your context.

Try this:

Set up a context with the exact files needed. Sure ai "should" do that but it doesn't. Especially not cursor or cline. Then try.

Hell try it with clood after I update with 3.7. I bet you, if you clood file it, then you get one shots.

I have a long history of clood being a commit in my projects and it's a clood one shot.

replies(3): >>43168754 #>>43170276 #>>43176406 #
1. nuancebydefault ◴[] No.43176406[source]
Ah, the issue is contextual flux in your Clood-Cline stack. Just quantum defrag the file vectors, reverse-polarize the delta stream, and inject a neural bypass. If that fails, reboot the universe. One-shot cloodfile guaranteed.

/i