Most active commenters
  • martin-t(3)
  • bitcrusher(3)

←back to thread

470 points mraniki | 22 comments | | HN request time: 1.646s | source | bottom
1. neal_ ◴[] No.43534543[source]
I was using gemini 2.5 pro yesterday and it does seem decent. I still think claude 3.5 is better at following instruction then the new 3.7 model which just goes ham messing stuff up. Really disappointed by Cursor and the Claude CLI tool, for me they create more problems then fix. I cant figure out how to use them on any of my projects with out them ruining the project and creating terrible tech debt. I really like the way gemini shows how much context window is left, i think every company should have this. To be honest i think there has been no major improvement beyond the original models which gained popularity first. Its just marginal improvements 10% better or something, and the free models like deepseek are actually better imo then anything openai has. I dont think the market can withstand the valuations of the big ai companies. They have no advantage, there models suck worse then free open source ones, and they charge money??? Where is the benefit to there product?? People originally said the models are the moat and methods are top secret, but turns out its pretty easy to reproduce these models, and its the application layer built on top of the models that is much more specific and has the real moat. People said the models would engulf these applications built ontop and just integrate natively.
replies(4): >>43534760 #>>43534894 #>>43535115 #>>43536010 #
2. cjonas ◴[] No.43534760[source]
My only experience is via cursor but I'd agree in that context 3.7 is worse than 3.5. 3.7 goes crazy trying to fix any little linter errors and often gets confused and will just hammer away, making things worse until I stop generation. I think if I let it continue it would probably proposed rm -rf and start over at some point :).

Again, this could just have to do with the way cursor is prompting it.

replies(4): >>43535188 #>>43535734 #>>43535794 #>>43537180 #
3. vlovich123 ◴[] No.43534894[source]
Have you tried wind surf? I’ve been really enjoying it and wondering if they do something on top to make it work better. The AI definitely still gets into weird rabbit holes and sometimes even injects security bugs (kept trying to add sandbox permissions for an iframe), but at least for UI work it’s been an accelerant.
4. mountainriver ◴[] No.43535115[source]
My whole team feels like 3.7 is a letdown. It really struggles to follow instructions as others are mentioning.

Makes me think they really just hacked the benchmarks on this one.

replies(2): >>43535367 #>>43538050 #
5. runekaagaard ◴[] No.43535188[source]
I'm getting great and stable results with 3.7 on Claude desktop and mcp servers.

It feels like an upgrade from 3.5

6. ignoramous ◴[] No.43535367[source]
Claude Sonnet 3.7 Thinking is also an unmitigated disaster for coding. I was mistaken that a "thinking" model would be better at logic. It turns out "thinking" is a marketing term, a euphemism for "hallucinating" ... though, not unsurprising when you actually take a look at the model cards for these "reasoning" / "thinking" LLMs; however, I've found these to work nicely for IR (information retrieval).
replies(1): >>43544581 #
7. travisgriggs ◴[] No.43535734[source]
So glad to see this!! I thought it was just me!

The latest updates, I’m often like “would you just hold the f#^^ on trigger?!? Take a chill pill already”

8. theshrike79 ◴[] No.43535794[source]
I asked claude 3.7 to move a perfectly working module to another location.

What did it do?

A COMPLETE FUCKING REWRITE OF THE MODULE.

The result did work, because of unit tests etc. but still, it has a habit of going down the rabbit hole of fixing and changing 42 different things when you ask for one change.

9. martin-t ◴[] No.43536010[source]
Whenever I read about LLMs or try to use them, I feel like I am asleep in a dream where two contradicting things can be true at the same time.

On one hand, you have people claiming "AI" can now do SWE tasks which take humans 30 minutes or 2 hours and the time doubles every X months so by Y year, SW development will be completely automated.

On the other hand, you have people saying exactly what you are saying. Usually that LLMs have issues even with small tasks and that repeated/prolonged use generates tech debt even if they succeed on the small tasks.

These 2 views clearly can't both be true at the same time. My experience is the second category so I'd like to chalk up the first as marketing hype but it's confusing how many people who have seemingly nothing to gain from the hype contribute to it.

replies(4): >>43536241 #>>43536654 #>>43537271 #>>43537992 #
10. aleph_minus_one ◴[] No.43536241[source]
> Whenever I read about LLMs or try to use them, I feel like I am asleep in a dream where two contradicting things can be true at the same time.

This is called "paraconsistent logic":

* https://en.wikipedia.org/wiki/Paraconsistent_logic

* https://plato.stanford.edu/entries/logic-paraconsistent/

11. radicality ◴[] No.43536654[source]
At first thought you are gonna talk about how various LLMs will gaslight you, and say something is true, then only change their mind once you provide a counter example and when challenged with it, will respond “I obviously meant it’s mostly true, in that specific case it’s false”.
12. heed ◴[] No.43537180[source]
believe it or not, i had cursor in yolo mode just for fun recently and 3.7 rm -rf'd my home folder :(
replies(1): >>43538121 #
13. frankohn ◴[] No.43537271[source]
> people claiming "AI" can now do SWE tasks which take humans 30 minutes or 2 hours

Yes people claim that but everyone with a grain of salt in his mind know this is not true. Yes, in some cases an LLM can write from scratch a python or web demo-like application and that looks impressive but it is still far from really replacing a SWE. Real world is messy and requires to be careful. It requires to plan, do some modifications, get some feedback, proceed or go back to the previous step, think about it again. Even when a change works you still need to go back to the previous step, double check, make improvements, remove stuff, fix errors, treat corner cases.

The LLM doesn't do this, it tries to do everything in one single step. Yes, even when it is in "thinking" mode, in thinks ahead and explore a few possibilities but it doesn't do several iterations as it would be needed in many cases. It does a first write like a brilliant programmers may do in one attempt but it doesn't review its work. The idea of feeding back the error to the LLM so that it will fix it works in simple cases but in most common cases, where things are more complex, that leads to catastrophes.

Also when dealing with legacy code it is much more difficult for an LLM because it has to cope with the existing code with all its idiosincracies. One need in this case a deep understanding of what the code is doing and some well-thought planning to modify it without breaking everything and the LLM is usually bad as that.

In short, LLM are a wonderful technology but they are not yet the silver bullet someone pretends it to be. Use it like an assistant to help you on specific tasks where the scope is small the the requirements well-defined, this is the domain where it does excel and is actually useful. You can also use it to give you a good starting point in a domain you are nor familiar or it can give you some good help when you are stuck on some problem. Attempt to give the LLM a stack to big or complex are doomed to failure and you will be frustrated and lose your time.

14. bitcrusher ◴[] No.43537992[source]
I'm not sure why this is confusing? We're seeing the phenomenon everywhere in culture lately. People WANT something to be true and try to speak it into existence. They also tend to be the people LEAST qualified to speak about the thing they are referencing. It's not marketing hype, it is propaganda.

Meanwhile, the 'experts' are saying something entirely different and being told they're wrong or worse, lying.

I'm sure you've seen it before, but this propaganda, in particular, is the holy grail of 'business people'. The ones who "have a great idea, just need you to do all the work" types. This has been going on since the late 70s, early 80s.

replies(1): >>43541254 #
15. dimitri-vs ◴[] No.43538050[source]
They definitely over-optimized it for agentic use - where the quality of the code doesn't matter as much as it's ability to run, even if just barely. When you view it from that perspective all that nested errors handling, excessive comments, 10 lines that can be done in 2, etc. start to make sense.
16. neal_ ◴[] No.43538121{3}[source]
thats crazy! I haven't heard of yolo mode?? dont they like restrict access to the project? but i guess the terminal is unrestricted? lol i wonder what it was trying to do
replies(1): >>43560811 #
17. martin-t ◴[] No.43541254{3}[source]
Not necessarily confusing but very frustrating. This is probably the first time I encountered such a wide range of opinions and therefore such a wide range of uncertainty in a topic close to me.

When a bunch of people very loudly and confidently say your profession, and something you're very good at, will become irrelevant in the next few years, it makes you pay attention. And when you then can't see what they claim to be seeing, then it makes you question whether something is wrong with you or them.

replies(1): >>43548219 #
18. theshrike79 ◴[] No.43544581{3}[source]
Overthinking without extra input is always bad.

It's super bad for humans too. You start to spiral down a dark path when your thoughts run away and make up theories and base more theories on those etc.

19. bitcrusher ◴[] No.43548219{4}[source]
Totally get that; I'm on the older side, so personally I've been down this road quite a few times. We're ALWAYS on the verge of our profession being rugged somehow. RAD tools, Outsourcing, In-sourcing, No-Code, AI/LLM... I used to be curious about why there was overwhelming pressure to eliminate "us", but gave up and just focus on doing good work.
replies(1): >>43552583 #
20. martin-t ◴[] No.43552583{5}[source]
The pressure is simple - money. Competent people are rare and we're not cheap. But it turns out, those cheaper less competent people can't replace us, no matter what tools you give them - there is fundamental complexity to the work we do which they can't handle.

However, I think this time is qualitatively different. This time the rich people who wanna get rid of us are not trying to replace us with other people. This time, they are trying to simulate _us_ using machines. To make "us" faster, cheaper and scalable.

I don't think LLMs will lead to actual AI and their benefit is debatable. But so much money is going into the research that somebody might just manage to build actual AI and then what?

Hopefully, in 10 years we'll all be laughing at how a bunch of billionaires went bankrupt by trying to convince the world that autocomplete was AI. But if not, a whole bunch of people will be competing for a much smaller pool of jobs, making us all much, much poorer, while they will capture all the value that would have normally been produced by us right into their pockets.

replies(1): >>43558333 #
21. bitcrusher ◴[] No.43558333{6}[source]
I agree; I wasn't clear in my previous post. I understand the economic underpinnings. I cannot understand the coupled animus and have stopped trying.
22. heed ◴[] No.43560811{4}[source]
it had created a config file in my home dir and i asked it to move it to the project folder and apparently it thought deleting the entire home dir first was necessary? not sure because after my home folder was gone things started disappearing lol