I'm curious: what do you do when the LLM starts hallucinating, or gets stuck in a loop of generating non-working code that it can't get out of? What do you do when you need to troubleshoot and fix an issue it introduced, but has no idea how to fix?
In my experience of these tools, including the flagship models discussed here, this is a deal-breaking problem. If I have to waste time re-prompting to make progress, and reviewing and fixing the generated code, it would be much faster if I wrote the code from scratch myself. The tricky thing is that unless you read and understand the generated code, you really have no idea whether you're progressing or regressing. You can ask the model to generate tests for you as well, but how can you be sure they're written correctly, or covering the right scenarios?
More power to you if you feel like you're being productive, but the difficult things in software development always come in later stages of the project[1]. The devil is always in the details, and modern AI tools are just incapable of getting us across that last 10%. I'm not trying to downplay their usefulness, or imply that they will never get better. I think current models do a reasonably good job of summarizing documentation and producing small snippets of example code I can reuse, but I wouldn't trust them for anything beyond that.
[1]: https://en.wikipedia.org/wiki/Ninety%E2%80%93ninety_rule