Coding with LLMs in the summer of 2025 – an update

Have used Claude's GitHub action quite a bit now (10-20 issue implementations, a bit more PR reviews), and it is hit and miss so agree with the enhanced coding rather than just letting it run loose.

When the change is very small, self-contained feature/refactor it can mostly work alone, if you have tests that cover the feature then it is relatively safe (and you can do other stuff because it is running in an action, which is a big plus...write the issue and you are done, sometimes I have had Claude write the issue too).

When it gets to a more medium size, it will often produce something that will appear to work but actually doesn't. Maybe I don't have test coverage and it is my fault but it will do this the majority of the time. I have tried writing the issue myself, adding more info to claude.md, letting claude write the issue so it is a language it understands but nothing works, and it is quite frustrating because you spend time on the review and then see something wrong.

And anything bigger, unsurprisingly, it doesn't do well.

PR reviews are good for small/medium tasks too. Bar is lower here though, much is useless but it does catch things I have missed.

So, imo, still quite a way from being able to do things independently. For small tasks, I just get Claude to write the issue, and wait for the PR...that is great. For medium (which is most tasks), I don't need to do much actual coding, just directing Claude...but that means my productivity is still way up.

I did try Gemini but I found that when you let it off the leash and accept all edits, it would go wild. We have Copilot at work reviewing PRs, and it isn't so great. Maybe Gemini better on large codebases where, I assume, Claude will struggle.