←back to thread

2127 points bakugo | 1 comments | | HN request time: 0s | source
Show context
jumploops ◴[] No.43163548[source]
> "[..] in developing our reasoning models, we’ve optimized somewhat less for math and computer science competition problems, and instead shifted focus towards real-world tasks that better reflect how businesses actually use LLMs.”

This is good news. OpenAI seems to be aiming towards "the smartest model," but in practice, LLMs are used primarily as learning aids, data transformers, and code writers.

Balancing "intelligence" with "get shit done" seems to be the sweet spot, and afaict one of the reasons the current crop of developer tools (Cursor, Windsurf, etc.) prefer Claude 3.5 Sonnet over 4o.

replies(4): >>43163694 #>>43164052 #>>43164203 #>>43164889 #
bicx ◴[] No.43163694[source]
Claude 3.5 has been fantastic in Windsurf. However, it does cost credits. DeepSeek V3 is now available in Windsurf at zero credit cost, which was a major shift for the company. Great to have variable options either way.

I’d highly recommend anyone check out Windsurf’s Cascade feature for agentic-like code writing and exploration. It helped save me many hours in understanding new codebases and tracing data flows.

replies(3): >>43163786 #>>43163928 #>>43164295 #
throwup238 ◴[] No.43163928[source]
DeepSeek’s models are vastly overhyped (FWIW I have access to them via Kagi, Windsurf, and Cursor - I regularly run the same tests on all three). I don’t think it matters that V3 is free when even R1 with its extra compute budget is inferior to Claude 3.5 by a large margin - at least in my experience in both bog standard React/Svelte frontend code and more complex C++/Qt components. After only half an hour of using Claude 3.7, I find the code output is superior and the thinking output is in a completely different universe (YMMV and caveat emptor).

For example, DeepSeek’s models almost always smash together C++ headers and code files even with Qt, which is an absolutely egregious error due to the meta-object compiler preprocessor step. The MOC has been around for at least 15 years and is all over the training data so there’s no excuse.

replies(4): >>43164194 #>>43164480 #>>43164487 #>>43164508 #
bionhoward ◴[] No.43164480[source]
The big difference is DeepSeek R1 has a permissive license whereas Claude has a nightmare “closed output” customer noncompete license which makes it unusable for work unless you accept not competing with your intelligence supplier, which sounds dumb
replies(1): >>43166239 #
Aeolun ◴[] No.43166239[source]
Do most people have an expectation of competing with Claude?
replies(2): >>43166311 #>>43167903 #
1. woah ◴[] No.43167903[source]
Seems like that must make it impossible for the Cursor devs to use their own product given that Claude is the default there