Context is the bottleneck for coding agents now

(runnercode.com)

Show context

aliljet ◴[26 Sep 25 15:27 UTC] No.45387614[source]▶

There's a misunderstanding here broadly. Context could be infinite, but the real bottleneck is understanding intent late in a multi-step operation. A human can effectively discard or disregard prior information as the narrow window of focus moves to a new task, LLMs seem incredibly bad at this.

Having more context, but leaving open an inability to effectively focus on the latest task is the real problem.

replies(10): >>45387639 #>>45387672 #>>45387700 #>>45387992 #>>45388228 #>>45388271 #>>45388664 #>>45388965 #>>45389266 #>>45404093 #

bgirard ◴[26 Sep 25 15:34 UTC] No.45387700[source]▶

>>45387614 #

I think that's the real issue. If the LLM spends a lot of context investigating a bad solution and you redirect it, I notice it has trouble ignoring maybe 10K tokens of bad exploration context against my 10 line of 'No, don't do X, explore Y' instead.

replies(6): >>45387838 #>>45387902 #>>45388477 #>>45390299 #>>45390619 #>>45394242 #

1. ericmcer ◴[26 Sep 25 19:44 UTC] No.45390299[source]▶

>>45387700 #

It seems possible for openAI/Anthropic to rework their tools so they discard/add relevant context on the fly, but it might have some unintended behaviors.

The main thing is people have already integrated AI into their workflows so the "right" way for the LLM to work is the way people expect it to. For now I expect to start multiple fresh contexts while solving a single problem until I can setup a context that gets the result I want. Changing this behavior might mess me up.

replies(2): >>45391352 #>>45391400 #

2. sshine ◴[26 Sep 25 21:48 UTC] No.45391352[source]▶

>>45390299 (TP) #

> rework their tools so they discard/add relevant context on the fly

That may be the foundation for an innovation step in model providers. But you can achieve a poor man’s simulation if you can determine, in retrospect, when a context was at peak for taking turns, and when it got too rigid, or too many tokens were spent, and then simply replay the context up until that point.

I don’t know if evaluating when a context is worth duplicating is a thing; it’s not deterministic, and it depends on enforcing a certain workflow.

3. vel0city ◴[26 Sep 25 21:54 UTC] No.45391400[source]▶

>>45390299 (TP) #

A number of agentic coding tools do this. Upon an initial request for a larger set of actions, it will write a markdown file with its "thoughts" on its plan to do something, and keep notes as it goes. They'll then automatically compact their contexts and re-read their notes to keep "focused" while still having a bit of insight on what it did previously and what the original ask was.

replies(2): >>45391466 #>>45404113 #

4. cvzakharchenko ◴[26 Sep 25 22:02 UTC] No.45391466[source]▶

>>45391400 #

Interesting. I know people do this manually. But are there agentic coding tools that actually automate this approach?

replies(2): >>45391876 #>>45392793 #

5. sshine ◴[26 Sep 25 22:58 UTC] No.45391876{3}[source]▶

>>45391466 #

Claude Code has /init and /compact that do this. It doesn’t recreate the context as-is, but creates a context that is presumed to be functionally equivalent. I find that’s not the case and that building up from very little stored context and a lot of specialised dialogue works better.

6. vel0city ◴[27 Sep 25 02:16 UTC] No.45392793{3}[source]▶

>>45391466 #

I've seen this behavior with Cursor, Windsurf, and Amazon Q. It normally only does it for very large requests from what I've seen.

7. tom_m ◴[28 Sep 25 13:14 UTC] No.45404113[source]▶

>>45391400 #

This does help, yes. Todo lists are important. They also reinforce order of operations.

↑