Again, this could just have to do with the way cursor is prompting it.
Makes me think they really just hacked the benchmarks on this one.
It feels like an upgrade from 3.5
The latest updates, I’m often like “would you just hold the f#^^ on trigger?!? Take a chill pill already”
What did it do?
A COMPLETE FUCKING REWRITE OF THE MODULE.
The result did work, because of unit tests etc. but still, it has a habit of going down the rabbit hole of fixing and changing 42 different things when you ask for one change.
On one hand, you have people claiming "AI" can now do SWE tasks which take humans 30 minutes or 2 hours and the time doubles every X months so by Y year, SW development will be completely automated.
On the other hand, you have people saying exactly what you are saying. Usually that LLMs have issues even with small tasks and that repeated/prolonged use generates tech debt even if they succeed on the small tasks.
These 2 views clearly can't both be true at the same time. My experience is the second category so I'd like to chalk up the first as marketing hype but it's confusing how many people who have seemingly nothing to gain from the hype contribute to it.
This is called "paraconsistent logic":
Yes people claim that but everyone with a grain of salt in his mind know this is not true. Yes, in some cases an LLM can write from scratch a python or web demo-like application and that looks impressive but it is still far from really replacing a SWE. Real world is messy and requires to be careful. It requires to plan, do some modifications, get some feedback, proceed or go back to the previous step, think about it again. Even when a change works you still need to go back to the previous step, double check, make improvements, remove stuff, fix errors, treat corner cases.
The LLM doesn't do this, it tries to do everything in one single step. Yes, even when it is in "thinking" mode, in thinks ahead and explore a few possibilities but it doesn't do several iterations as it would be needed in many cases. It does a first write like a brilliant programmers may do in one attempt but it doesn't review its work. The idea of feeding back the error to the LLM so that it will fix it works in simple cases but in most common cases, where things are more complex, that leads to catastrophes.
Also when dealing with legacy code it is much more difficult for an LLM because it has to cope with the existing code with all its idiosincracies. One need in this case a deep understanding of what the code is doing and some well-thought planning to modify it without breaking everything and the LLM is usually bad as that.
In short, LLM are a wonderful technology but they are not yet the silver bullet someone pretends it to be. Use it like an assistant to help you on specific tasks where the scope is small the the requirements well-defined, this is the domain where it does excel and is actually useful. You can also use it to give you a good starting point in a domain you are nor familiar or it can give you some good help when you are stuck on some problem. Attempt to give the LLM a stack to big or complex are doomed to failure and you will be frustrated and lose your time.
Meanwhile, the 'experts' are saying something entirely different and being told they're wrong or worse, lying.
I'm sure you've seen it before, but this propaganda, in particular, is the holy grail of 'business people'. The ones who "have a great idea, just need you to do all the work" types. This has been going on since the late 70s, early 80s.
When a bunch of people very loudly and confidently say your profession, and something you're very good at, will become irrelevant in the next few years, it makes you pay attention. And when you then can't see what they claim to be seeing, then it makes you question whether something is wrong with you or them.
It's super bad for humans too. You start to spiral down a dark path when your thoughts run away and make up theories and base more theories on those etc.
However, I think this time is qualitatively different. This time the rich people who wanna get rid of us are not trying to replace us with other people. This time, they are trying to simulate _us_ using machines. To make "us" faster, cheaper and scalable.
I don't think LLMs will lead to actual AI and their benefit is debatable. But so much money is going into the research that somebody might just manage to build actual AI and then what?
Hopefully, in 10 years we'll all be laughing at how a bunch of billionaires went bankrupt by trying to convince the world that autocomplete was AI. But if not, a whole bunch of people will be competing for a much smaller pool of jobs, making us all much, much poorer, while they will capture all the value that would have normally been produced by us right into their pockets.