Maybe we should collect all of these predictions, then go back in 5-10 years and see if anyone was actually right.
Maybe we should collect all of these predictions, then go back in 5-10 years and see if anyone was actually right.
Overall, it echos my experience with Claude Opus 4.5 in particular. We’ve passed a threshold (one of several, no doubt).
I'm glad the OP feels fine just letting Opus do whatever it wants without a pause to look under the covers, and perhaps we all have to learn to stop worrying and love the LLM? But I think really, here and now, we're witness to just another hype article written by a professional blogger and speaker, who's highly motivated to write engagement bait like this.
Now do something like I did: An application that can get your IMDB/Letterboxd/Goodreads/Steam libraries and store them locally (own your data). Also use OMDB/TMDB to enrich the movie and TV show data.
If you can write all that code faster than read what Claude did, I salute you and will subscribe to your Substack and Youtube channels :)
Oh btw, neither Goodreads, IMDB nor Letterboxd have proper export APIs so you need to have a playwright-style browser automation do it. Just debugging that mess by writing all the code yourself is going to be hours and hours.
The Steam API access Claude one-shotted (with Sonnet 3.7, this was a long time ago) as well as enriching the input data from different sources.
Things evolve faster then people realize... Agent mode, then came mcp servers, sub agents, now its rag databases allowing the LLMs to get data directly.
The development of LLMS looks slow but with each iteration, things get improved. As yourself, what will have been the result of those same tests you ran, 21 months ago, with Claude 3.0? How about Claude 4.0, that is only 8 months ago.
Right now Opus 4.5 is darn functional. The issue is more often not the code that it write, but more often it get stuck on "its too complex, let me simplify it", with the biggest issue often being context capacity.
LLMs are still bad at deeper tasks, but compared to the last LLMs, the jumps have been enormous. What about a year from now? Two years? I have a hard time believing that Claude 3 was not even 2 years but just 21 month ago. And we considered that a massive jump up, useful for working on a single file... Now we are throwing it entire codebases and is darn good at debugging, editing etc.
Do i like the results? No, there are lots of times that the results are not what "i wanted", but that is often a result of my own prompting being too generic.
LLMs are never going to really replace experience programmers, but boy is the progress scary.
I think you need to parse my comment a little more keenly ;)
> The Steam API access Claude one-shotted (with Sonnet 3.7, this was a long time ago) as well as enriching the input data from different sources.
This story isn't different to the usual "I made a throw-away thing with an LLM and it was super useful and it took me no time at all". It's very different to the OP stating you can throw this at mature or legacy codebases and reduce labour inputs by 90%. If the correctness of the code matters (which it will as the codebase grows), you still need to read the code, and that still takes human eyes.
(It wasn't clear in my comment, but I already use agents for my code. I just think the OPs claims are overblown.)