(www.anthropic.com)

2127 points bakugo | 1 comments | 24 Feb 25 18:28 UTC | HN request time: 0.216s | source

Show context

Claude 3.7 Sonnet scored 60.4% on the aider polyglot leaderboard [0], WITHOUT USING THINKING.

Tied for 3rd place with o3-mini-high. Sonnet 3.7 has the highest non-thinking score, taking that title from Sonnet 3.5.

Aider 0.75.0 is out with support for 3.7 Sonnet [1].

Thinking support and thinking benchmark results coming soon.

1. miroljub ◴[25 Feb 25 15:20 UTC] No.43173004[source]▶

And yet, "DeepSeek R1 + claude-3-5-sonnet-20241022" scores 64% on the same benchmark 30% cheaper.

It's amazing what Deepseek is putting on the table while being full open source.

Claude 3.7 Sonnet and Claude Code