I find AI helpful but no where near a multiplier in my day to day development experience. Converting a csv to json or vis-versa great, but AI writing code for me has been less helpful. Beyond boiler plate, it introduces subtle bugs that are a pain in the ass to deal with. For complicated things, it struggles and does too much and because I didn't write it I don't know where the bad spots are. And AI code review often gets hung up on nits and misses real mistakes.
So what are you doing and what are the resources you'd recommend?
Makes it seem like the actual problem to be solved is reducing the amount of boilerplate code that needs to be written, not using an LLM to do it.
I'm not smart enough to write a language or modify one, so this opinion is strongly spoken, weakly held.
To give a concrete example: I'm pretty good at doing Python coding on a whiteboard, because that's what I practiced for job interviews, and when I first learned Python I used Vim without setting up any Python integration.
I'm pretty terrible at doing Rust on a whiteboard, because I only learned it when I had a decent IDE and later even AI support.
Nevertheless, I don't think I'm a better programmer in Python.
Basically, I treat LLMs like a fairly competent unpaid intern and extend about the same level of trust to the output they produce.
What you will find is that the agent is much more successful in this regard.
The LLM has certain intrinsic abilities that match us and like us it cannot actually code 10,000 lines of code and have everything working in one go. It does better when you develop incrementally and verify each increment. The smaller the increments the better it performs.
Unfortunately the chain of thought process doesn’t really do this. It can come up with steps, sometimes the steps are too big and it almost never properly verifies things are working after each increment. That’s why you have to put yourself in the loop here.
Like allowing the computer to run test and verify an application works as expected on each step and to even come up with what verification means is a bit of what’s missing here and I think although this part isn’t automated yet, it can easily be automated where humans become less and less involved and distance themselves into a more and more supervisory role.
The first thing I'll note is that Claude Code with Claude 4 has been vastly better than everything else for me. Before that it was more like a 5-10% increase in productivity.
My workflow with Claude Code is very plain. I give it a relatively short prompt and ask it to create a plan. I iterate on the plan several times. I ask it to give me a more detailed plan. I iterate on that several times, then have Claude write it down and /clear to reset context.
Then, I'll usually do one or more "prototype" runs where I implement a solution with relatively little attention to code quality, to iron out any remaining uncertainties. Then I throw away that code, start a new branch, and implement it again while babysitting closely to make sure the code is good.
The major difference here is that I'm able to test out 5-10 designs in the time I would normally try 1 or 2. So I end up exploring a lot more, and committing better solutions.
Sometimes I´ll even go a bit crazy on this planning thing and do things a bit similar to what this guy shows: https://www.youtube.com/watch?v=XY4sFxLmMvw I tend to steer the process more myself, but typing whatever vague ideas are in my mind and ending up in minutes with a milestone and ticket list is very enabling, even if it isn´t perfect.
I also do more "drive by" small improvements:
- Annoying things that weren't important enough for a side quest writing a shell script, now have a shell script or an ansible playbook.
- That ugly CSS in an internal tool untouched for 5 years? fixed in 1 minute.
- The small prototype put into production with 0 documentation years ago? I ask an agentic tool to provide a basic readme and then edit it a bit so it doesn´t lie, well worth 15 minutes.
I also give it a first shot at finding the cause of bugs/problems. Most of the time it doesn't work, but in the last week it found right away the cause of some long standing subtle problems we had in a couple places.
I have also had sometimes luck providing it with single functions or modules that work but need some improvement (make this more DRY, improve error handling, log this or that...) Here I´m very conservative with the results because as you said it can be dangerous.
So am I more productive? I guess so, I don't think 4x or even 2x, I don't think projects are getting done much faster overall, but stuff that wouldn't have been done otherwise is being done.
What usually falls flat is trying to go on a more "vibe-coding" route. I have tried to come up with a couple small internal tools and things like that, and after promising starts, the agents just can't deal with the complexity without needing so much help that I'd just go faster by myself.