I agree that AI is inevitable. But there’s such a level of groupthink about it at the moment that everything is manifested as an agentic text box. I’m looking forward to discovering what comes after everyone moves on from that.
That is what I find so wild about the current conversation and debate. I have claude code toiling away building my personal organization software right now that uses LLMs to take unstructured input and create my personal plans/project/tasks/etc.
When someone uses an agent to increase their productivity by 10x in a real, production codebase that people actually get paid to work on, that will start to validate the hype. I don’t think we’ve seen any evidence of it, in fact we’ve seen the opposite.
It is really the same kind of thing.. but the model is "smarter" then a junior engineer usually. You can say something like "hmm.. I think an event bus makes sense here" Then the LLM will do it in 5 seconds. The problem is that there are certain behavioral biases that require active reminding (though I think some MCP integration work might resolve most of them, but this is just based on the current Claude Code and Opus/Sonnet 4 models)
The types of tasks I have been putting Claude Code to work on are iterative changes on a medium complexity code base. I have an extensive Claude.md. I write detailed PRDs. I use planning mode to plan the implementation with Claude. After a bunch of iteration I end up with nicely detailed checklists that take quite a lot of time to develop but look like a decent plan for implementation. I turn Claude (Opus) loose and religiously babysit it as it goes through the implementation.
Less than 50% of the time I end up with something that compiles. Despite spending hundreds of thousands of tokens while Claude desperately throws stuff against the wall trying to make it work.
I end up spending as much time as it would have taken just to write it to get through this process AND then do a meticulous line by line review where I typically find quite a lot to fix. I really can't form a strong opinion about the efficiency of this whole thing. It's possible this is faster. It's possible that it's not. It's definitely very high variance.
I am getting better at pattern matching on things AI will do competently. But it's not a long list and it's not much of the work I actually do in a day. Really the biggest benefit is that I end up with better documentation because I generated all of that to try and make the whole thing actually work in the first place.
Either I am doing something wrong, the work that AI excels at looks very different than mine, or people are just lying.
I'm kind of surprised, certainly there is a locality bias and an action bias to the model by default, which can partially be mitigated by claude.md instructions (though it isn't great at following if you have too much instruction there). This can lead to hacky solutions without additional meta-process.
I've been experimenting with different ways for the model to get the necessary context to understand where the code should live and the patterns it should use.
I have used planning mode only a little (I was just out of the country for 3 weeks and not coding, so it has only just become available before I left, but it wasn't a requirement in my past experience)
The only BIG thing I want from Claude Code right now is a "Yes, and.." for accepting code edits where I can steer the next step while accepting the code.
When I point it at my projects though, the outcomes are much less reliable and often quite frustrating.
However if you can quickly read code, see and succintly communicate the more optimal solution, you can easily 10x-20x your ability to code.
I'm begining to believe it may primarily come down to having the vocabulary and linguistic ability to succintly and clearly state the gaps in the code.
Do you believe you've managed to solve the most common wisdom in the software engineering industry? That reading code is much harder than writing it? If you have, then you should write up a white paper for the rest of us to follow.
Because every time I've seen someone say this, it's from someone that doesn't actually read the code they're reviewing.
Working with production code is basically jumping straight to the ball of mud phase, maybe somewhat less tangled but usually a much much larger codebase. Its very hard to describe to an LLM what to even do since you have such a complex web of interactions to consider in most mature production code.
How are you measuring this? Are you actually saying that you _feel_ slightly more productive?
I think it is funny how people act like it is a new problem. If the AI is having trouble with a "ball of mud", don't make mud balls (or learn to carve out abstractions). This cognitive load is impacting everyone working on that codebase. Skilled engineers enable less skilled engineers to flourish by creating code bases where change is easy because the code is modular and self-contained.
I think one sad fact is many/most engineers don't have the skills to understand how to refactor mature code to make it modular. This also means they can't communicate to the AI what kind of refactoring they should make.
Without any guidance Claude will make mud balls because of two tendencies, the tendency to put code where it is consumed and the tendency to act instead of researching.
There are also some second level tendencies that you also need to understand, like the tendency to do a partial migration when changing patterns.
These tendencies are not even unique to the AI, I'm sure we have worked with people like that.
So to counteract these tendencies, just apply your same skills at reading code and understanding when an abstraction is leaky or a method doesn't align with your component boundary. Then you too can have AI building pretty good componentized code.
For example in my pet current project I have a clear CQRS api, access control proxies, repositories for data access. Clearly defined service boundaries.
It is easy for me to see when the AI for example makes a mistake like not using the data repository or access control because it has to add an import statement and dependency that I don't want. All I have to do is nudge it in another direction.
We saw the same thing with blockchain. We started seeing the most ridiculous attempts to integrate blockchain, by companies where it didn't even make any sense. But it was all because doing so excited investors and boosted stock prices and valuations, not because consumers wanted it.
You cannot effectively employ a team of twenty junior developers if you have to review all of their code (unless you have like seven senior developers, too).
But this isn't a point that needs to be debated. If it is true that LLMs can be as effective as a team of 20 junior developers, then we should be seeing many people quickly producing software that previously required 20 junior devs.
> but the model is "smarter" then a junior engineer usually
And it is also usually worse than interns in some crucial respects. For example, you cannot trust the models to reliably tell you what you need to know such as difficulties they've encountered or important insights they've learnt and understand they're important to communicate.