←back to thread

467 points mraniki | 4 comments | | HN request time: 0.621s | source
Show context
neal_ ◴[] No.43534543[source]
I was using gemini 2.5 pro yesterday and it does seem decent. I still think claude 3.5 is better at following instruction then the new 3.7 model which just goes ham messing stuff up. Really disappointed by Cursor and the Claude CLI tool, for me they create more problems then fix. I cant figure out how to use them on any of my projects with out them ruining the project and creating terrible tech debt. I really like the way gemini shows how much context window is left, i think every company should have this. To be honest i think there has been no major improvement beyond the original models which gained popularity first. Its just marginal improvements 10% better or something, and the free models like deepseek are actually better imo then anything openai has. I dont think the market can withstand the valuations of the big ai companies. They have no advantage, there models suck worse then free open source ones, and they charge money??? Where is the benefit to there product?? People originally said the models are the moat and methods are top secret, but turns out its pretty easy to reproduce these models, and its the application layer built on top of the models that is much more specific and has the real moat. People said the models would engulf these applications built ontop and just integrate natively.
replies(4): >>43534760 #>>43534894 #>>43535115 #>>43536010 #
1. mountainriver ◴[] No.43535115[source]
My whole team feels like 3.7 is a letdown. It really struggles to follow instructions as others are mentioning.

Makes me think they really just hacked the benchmarks on this one.

replies(2): >>43535367 #>>43538050 #
2. ignoramous ◴[] No.43535367[source]
Claude Sonnet 3.7 Thinking is also an unmitigated disaster for coding. I was mistaken that a "thinking" model would be better at logic. It turns out "thinking" is a marketing term, a euphemism for "hallucinating" ... though, not unsurprising when you actually take a look at the model cards for these "reasoning" / "thinking" LLMs; however, I've found these to work nicely for IR (information retrieval).
replies(1): >>43544581 #
3. dimitri-vs ◴[] No.43538050[source]
They definitely over-optimized it for agentic use - where the quality of the code doesn't matter as much as it's ability to run, even if just barely. When you view it from that perspective all that nested errors handling, excessive comments, 10 lines that can be done in 2, etc. start to make sense.
4. theshrike79 ◴[] No.43544581[source]
Overthinking without extra input is always bad.

It's super bad for humans too. You start to spiral down a dark path when your thoughts run away and make up theories and base more theories on those etc.