And I really wish I could trust an llm for that, or, indeed, any task. But I generally find answers fall into one of these useless buckets: 1. Reword the question as an answer (so common, so useless) 2. Trivial solutions that are correct - meaning one or two lines that are valid, but that I could have easily written myself quicker than getting an agent involved, and without the other detractors on this list 3. Wildly incorrect "solutions". I'm talking about code that doesn't even build because the llm can't take proper direction on which version of the library to refer to, so it keeps giving results based off old information that is no longer relevant. Try resolving a webpack 5 issue - you'll get a lot of webpack 4 answers and none of them will work, even if you specify webpack 5 4. The absolute worst: subtly incorrect solutions that seem correct and are confidently presented as correct. This has been my experience with basically every "oh wow, look what the llm can do" demo. I'm that annoying person who finds the big mid-demo.
The problems are: 1. A person inexperienced in the domain will flounder for ages trying out crap that doesn't work and understanding nothing of it. 2. A person experienced in the domain will spend a reasonable amount of time correcting the llm - and personally, I'd much rather write my own code via tdd-driven emergent design - I'll understand it, and it will be proven to work when it's done.
I see that proponents of the tech often gloss over this and don't realise that they're actually spending more time overall, especially when having to polish out all the bugs. Or maintain the system.
Use whatever you want, but I've got zero confidence in the models, and I prefer to write code instead of gambling. But to each, their own.
This is just not my experience with coding agents, which is interesting. You could chalk this up to me being a bad coder, insufficiently picky, being fooled by plausible looking code, whatever, but I carefully read every diff the agent suggests, force it to keep every diff small enough for that to be easy, and I'm usually very good at spotting potential bugs, and very picky about code quality, and the ultimate test passes: the generated code works, even when I extensively test it in daily usage. I wonder if maybe it has something to do with the technologies or specific models/agents you're using? Regarding version issues, that's usually something I solve by pointing the agent at a number of docs for the version I want and having it generate documentation for itself, and then @'ing those docs in the prompt moving forward, or using llms.txt if available, and that usually works a charm for teaching it things.
> I see that proponents of the tech often gloss over this and don't realise that they're actually spending more time overall, especially when having to polish out all the bugs. Or maintain the system.
I am a very fast, productive coder by hand. I guarantee you, I am much faster with agentic coding, just in terms of measuring the number of days it takes me to finish a feature or greenfield prototype. And I doing corrections are a confounding factor because I very rarely have to correct these models. For some time I used agent that tracks how often as a percentage I accept tool calls including edits the agent suggests. One thing to know about me is that I do not ever accept subpar code. If I don't like an agent's suggestion I do not accept and then iterate; I want it to get it right from the first. My acceptance rate was 95%.