Context is the bottleneck for coding agents now

1. marstall ◴[26 Sep 25 15:23 UTC] No.45387565[source]▶

> Level 2 - One commit - Cursor and Claude Code work well for tasks in this size range.

I'll stop ya right there. Spending the past few weeks fixing bugs in a big multi-tier app (which is what any production software is this days). My output per bug is always one commit, often one line.

Claude is an occasional help, nothing more. Certainly not generating the commit for me!

replies(3): >>45387658 #>>45387714 #>>45387799 #

2. ◴[26 Sep 25 15:31 UTC] No.45387658[source]▶

>>45387565 (TP) #

3. SparkyMcUnicorn ◴[26 Sep 25 15:36 UTC] No.45387714[source]▶

>>45387565 (TP) #

I'll stop you right there. I've been using Claude Code for almost a year on production software with pretty large codebases. Both multi-repo and monorepo.

Claude is able to create entire PRs for me that are clean, well written, and maintainable.

Can it fail spectacularly? Yes, and it does sometimes. Can it be given good instructions and produce results that feel like magic? Also yes.

replies(1): >>45387929 #

4. agf ◴[26 Sep 25 15:44 UTC] No.45387799[source]▶

>>45387565 (TP) #

This is interesting, and I'd say you're not the target audience. If you want the code Claude writes to be line-by-line what you think is most appropriate as a human, you're not going to get it.

You have to be willing to accept "close-ish and good enough" to what you'd write yourself. I would say that most of the time I spend with Claude is to get from its initial try to "close-ish and good enough". If I was working on tiny changes of just a few lines, it would definitely be faster just to write them myself. It's the hundreds of lines of boilerplate, logging, error handling, etc. that makes the trade-off close to worth it.

replies(1): >>45388171 #

5. ljm ◴[26 Sep 25 15:53 UTC] No.45387929[source]▶

>>45387714 #

For finicky issues like that I often find that, in the time it takes to create a prompt with the necessary context, I was able to just make the one line tweak myself.

In a way that is still helpful, especially if the act of putting the prompt together brought you to the solution organically.

Beyond that, 'clean', 'well written' and 'maintainable' are all relative terms here. In a low quality, mega legacy codebase, the results are gonna be dogshit without an intense amount of steering.

replies(1): >>45388743 #

6. layer8 ◴[26 Sep 25 16:17 UTC] No.45388171[source]▶

>>45387799 #

The parent comment didn’t say anything about expecting the LLM output “to be line-by-line what you think is most appropriate as a human”?

replies(1): >>45397244 #

7. SparkyMcUnicorn ◴[26 Sep 25 17:08 UTC] No.45388743{3}[source]▶

>>45387929 #

> For finicky issues like that I often find that, in the time it takes to create a prompt with the necessary context, I was able to just make the one line tweak myself.

I don't run into this problem. Maybe the type of code we're working on is just very different. In my experience, if a one-line tweak is the answer and I'm spending a lot of time tweaking a prompt, then I might be holding the tool wrong.

Agree on those terms being relative. Maybe a better way of putting it is that I'm very comfortable putting my name on it, deploying to production, and taking responsibility for any bugs.

8. agf ◴[27 Sep 25 16:31 UTC] No.45397244{3}[source]▶

>>45388171 #

If I were making a single line code change, then Claude's "style" would take me enough time to edit away that it would make it slower than writing the change myself. I'm positing this is true also for the parent commenter.