←back to thread

2127 points bakugo | 1 comments | | HN request time: 0.205s | source
Show context
jumploops ◴[] No.43163548[source]
> "[..] in developing our reasoning models, we’ve optimized somewhat less for math and computer science competition problems, and instead shifted focus towards real-world tasks that better reflect how businesses actually use LLMs.”

This is good news. OpenAI seems to be aiming towards "the smartest model," but in practice, LLMs are used primarily as learning aids, data transformers, and code writers.

Balancing "intelligence" with "get shit done" seems to be the sweet spot, and afaict one of the reasons the current crop of developer tools (Cursor, Windsurf, etc.) prefer Claude 3.5 Sonnet over 4o.

replies(4): >>43163694 #>>43164052 #>>43164203 #>>43164889 #
bicx ◴[] No.43163694[source]
Claude 3.5 has been fantastic in Windsurf. However, it does cost credits. DeepSeek V3 is now available in Windsurf at zero credit cost, which was a major shift for the company. Great to have variable options either way.

I’d highly recommend anyone check out Windsurf’s Cascade feature for agentic-like code writing and exploration. It helped save me many hours in understanding new codebases and tracing data flows.

replies(3): >>43163786 #>>43163928 #>>43164295 #
1. ai-christianson ◴[] No.43163786[source]
I'm working on an OSS agent called RA.Aid and 3.7 is anecdotally a huge improvement.

About to push a new release that makes it the default.

It costs money but if you're writing code to make money, it's totally worth it.