←back to thread

Claude 3.7 Sonnet and Claude Code

(www.anthropic.com)

2127 points bakugo | 1 comments | 24 Feb 25 18:28 UTC | HN request time: 0.316s | source

Show context

jumploops ◴[24 Feb 25 19:09 UTC] No.43163548[source]▶

>>43163011 (OP) #

> "[..] in developing our reasoning models, we’ve optimized somewhat less for math and computer science competition problems, and instead shifted focus towards real-world tasks that better reflect how businesses actually use LLMs.”

This is good news. OpenAI seems to be aiming towards "the smartest model," but in practice, LLMs are used primarily as learning aids, data transformers, and code writers.

Balancing "intelligence" with "get shit done" seems to be the sweet spot, and afaict one of the reasons the current crop of developer tools (Cursor, Windsurf, etc.) prefer Claude 3.5 Sonnet over 4o.

replies(4): >>43163694 #>>43164052 #>>43164203 #>>43164889 #

crowcroft ◴[24 Feb 25 19:48 UTC] No.43164052[source]▶

Sometimes I wonder if there is overfitting towards benchmarks (DeepSeek is the worst for this to me).

Claude is pretty consistently the chat I go back to where the responses subjectively seem better to me, regardless of where the model actually lands in benchmarks.

replies(2): >>43164229 #>>43165763 #

1. FergusArgyll ◴[24 Feb 25 22:36 UTC] No.43165763[source]▶

Ya, Claude crushes the smell test