I work on a large product with two decades of accumulated legacy, maybe that's the problem. I can see though how generating and editing a simple greenfield web frontend project could work much better, as long as actual complexity is low.
I work on a large product with two decades of accumulated legacy, maybe that's the problem. I can see though how generating and editing a simple greenfield web frontend project could work much better, as long as actual complexity is low.
When I give AI a smaller or more focused project, it's magical. I've been using Claude Code to write code for ESP32 projects and it's really impressive. OTOH, it failed to tell me about a standard device driver I could be using instead of a community device driver I found. I think any human who works on ESP-IDF projects would have pointed that out.
AI's failings are always a little weird.
This is not the case for most monoliths, unless they are structured into LLM-friendly components that resemble patterns the models have seen millions of times in their training data, such as React components.
Definitely. I've found Claude at least isn't so good at working in large existing projects, but great at greenfielding.
Most of my use these days is having it write specific functions and tests for them, which in fairness, saves me a ton of time.
Another good use case is to use it for knowledge searching within a codebase. I find that to be incredibly useful without much context "engineering"
public static double ScoreItem(Span<byte> candidate, Span<byte> target)
{
//TODO: Return the normalized Levenshtein distance between the 2 byte sequences.
//... any additional edge cases here ...
}
I think generating more than one method at a time is playing with fire. Individual methods can be generated by the LLM and tested in isolation. You can incrementally build up and trust your understanding of the problem space by going a little bit slower. If the LLM is operating over a whole set of methods at once, it is like starting over each time you have to iterate.Genuine productivity boost but I don't feel like it's AI slop, sometimes it feels like its actually reading my mind and just preventing me from having to type...
I've tried vibe coding and usually end up with something subtly or horribly broken, with excessive levels of complexity. Once it digs itself a hole, it's very difficult to extricate it even with explicit instruction.
I've had net-time-savings with bigger agentic tasks, but I still have to check it line-by-line when it is done, because it takes lazy shortcuts and sometimes just outright gets things wrong.
Big productivity boost, it takes out the worst of my job, but I still can't trust it at much above the micro scale.
I wish I could give a system prompt for the tab complete; there's a couple of things it does over and over that I'm sure I could prompt away but there's no way to feed that in that I know of.
Now I use agentic coding a lot with maybe 80-90% success rate.
I’m on greenfield projects (my startup) and maintaining strict Md files with architecture decisions and examples helps a lot.
I barely write code anymore, and mostly code review and maintain the documentation.
In existing codebases pre-ai I think it’s near impossible because I’ve never worked anywhere that maintained documentation. It was always a chore.
I'm in a similar situation, and for the first time ever I'm actually considering if a rewrite to microservices would make sense, with a microservice being something small enough an AI could actually deal with - and maybe even build largely on its own.
Also - claude (~the best coding agent currently imo) will make mistakes, sometimes many of them - tell it to test the code it writes and make sure it's working - I've generally found its pretty good at debugging/testing and fixing it's own mistakes.
That’s the typical “claude code writes all my code” setup. That’s my setup.
This does require you to fit your problem to the solution. But when you do, the results are tremendous.
Eg it‘s great for refactoring now, it’s often updating the README along with renames without me asking. It’s also really good at rebasing quickly, but only by cherry-picking inside a worktree. Churning out small components I don’t want to add a new dependency for, those are usually good on first try.
For implementing whole features, the space of possible solutions is way too big to always hit something that I‘ll be satisfied with. Once I have an idea on how to implement something in broad strokes, I can give a very error ridden first draft to it as a stream of thoughts, let it read all required files, and make an implementation plan. Usually that’s not too far off, and doesn’t take that long. Once that’s done, Opus 4.5 is pretty good at implementing that plan. Still I read every line, if this will go to production.
Instead of dealing with intricacies of directly writing the code, I explain the AI what are we trying to achieve next and what approach I prefer. This way I am still on top of it, I am able to understand the quality of the code it generated and I’m the one who integrates everything.
So far I found the tools that are supposed to be able to edit the whole codebase at once be useless. I instantly loose perspective when the AI IDE fiddles with multiple code blocks and does some magic. The chatbot interface is superior for me as the control stays with me and I still follow the code writing step by step.
In contrast, a poorly designed microservice can be replaced much more easily. You can identify the worst-performing and most problematic microservices and replace them selectively.
Let's say you want to add a new functionality, for example plug to the shared user service, that already exist in another service in the same monorepo, the AI will be really good at identifying an example and applying it to your service.
Ironically, this would be the best workflow with humans too.
That's exactly my experience. While a well-structured monolith is a good idea in theory, and I'm sure such examples exist in practice, that has never been the case in any of my jobs. Friends working at other companies report similar experiences.
Using an agentic system that can at least read the other bits of code is more efficient than copypasting snippets to a web page.
* My 5 years old project: monorepo with backend, 2 front-ends and 2 libraries
* 10+ years old company project: about 20 various packages in monorepo
In both cases I successfully give Claude Code or OpenCode instructions either at package level or monorepo level. Usually I prefer package level.
E.g. just now I gave instructions in my personal project: "Invoice styles in /app/settings/invoice should be localized". It figured out that unlocalized strings comes from library package, added strings to the code and messages files (added missing translations), however has not cleaned up hardcoded strings from library. As I know code I have written extra prompt "Maybe INVOICE_STYLE_CONFIGS can be cleaned-up in such case" and it cleaned-up what I have expected, ran tests and linting.
Most code is about patterns, specific code styles and reusing existing libraries. Without context none of that can be applied to the solution.
If you put a programmer in a room and give them a piece of paper with a function and say OPTIMISE THAT! - is it going to be their best work?