Context is the bottleneck for coding agents now

(runnercode.com)

Show context

aliljet ◴[26 Sep 25 15:27 UTC] No.45387614[source]▶

There's a misunderstanding here broadly. Context could be infinite, but the real bottleneck is understanding intent late in a multi-step operation. A human can effectively discard or disregard prior information as the narrow window of focus moves to a new task, LLMs seem incredibly bad at this.

Having more context, but leaving open an inability to effectively focus on the latest task is the real problem.

replies(10): >>45387639 #>>45387672 #>>45387700 #>>45387992 #>>45388228 #>>45388271 #>>45388664 #>>45388965 #>>45389266 #>>45404093 #

1. tptacek ◴[26 Sep 25 16:25 UTC] No.45388271[source]▶

>>45387614 #

Asking, not arguing, but: why can't they? You can give an agent access to its own context and ask it to lobotomize itself like Eternal Sunshine. I just did that with a log ingestion agent (broad search to get the lay of the land, which eats a huge chunk of the context window, then narrow searches for weird stuff it spots, then go back and zap the big log search). I assume this is a normal approach, since someone else suggested it to me.

replies(2): >>45388348 #>>45388456 #

2. simonw ◴[26 Sep 25 16:31 UTC] No.45388348[source]▶

>>45388271 (TP) #

This is also the idea behind sub-agents. Claude Code answers questions about things like "where is the code that does X" by firing up a separate LLM running in a fresh context, posing it the question and having it answer back when it finds the answer. https://simonwillison.net/2025/Jun/2/claude-trace/

replies(2): >>45388378 #>>45388417 #

3. tptacek ◴[26 Sep 25 16:35 UTC] No.45388378[source]▶

>>45388348 #

I'm playing with that too (everyone should write an agent; basic sub-agents are incredibly simple --- just tool calls that can make their own LLM calls, or even just a tool call that runs in its own context window). What I like about Eternal Sunshine is that the LLM can just make decisions about what context stuff matters and what doesn't, which is a problem that comes up a lot when you're looking at telemetry data.

4. tra3 ◴[26 Sep 25 16:38 UTC] No.45388417[source]▶

>>45388348 #

I keep wondering if we're forgetting the fundamentals:

> Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it?

https://www.laws-of-software.com/laws/kernighan/

Sure, you eat the elephant one bite at a time, and recursion is a thing but I wonder where the tipping point here is.

replies(1): >>45388460 #

5. libraryofbabel ◴[26 Sep 25 16:41 UTC] No.45388456[source]▶

>>45388271 (TP) #

Yes! - and I wish this was easier to do with common coding agents like Claude Code. Currently you can kind of do it manually by copying the results of the context-busting search, rewinding history (Esc Esc) to remove the now-useless stuff, and then dropping in the results.

Of course, subagents are a good solution here, as another poster already pointed out. But it would be nice to have something more lightweight and automated, maybe just turning on a mode where the LLM is asked to throw things out according to its own judgement, if you know you're going to be doing work with a lot of context pollution.

replies(1): >>45388468 #

6. tptacek ◴[26 Sep 25 16:41 UTC] No.45388460{3}[source]▶

>>45388417 #

I think recursion is the wrong way to look at this, for what it's worth.

replies(1): >>45388565 #

7. tptacek ◴[26 Sep 25 16:42 UTC] No.45388468[source]▶

>>45388456 #

This is why I'm writing my own agent code instead of using simonw's excellent tools or just using Claude; the most interesting decisions are in the structure of the LLM loop itself, not in how many random tools I can plug into it. It's an unbelievably small amount of code to get to the point of super-useful results; maybe like 1500 lines, including a TUI.

replies(1): >>45390488 #

8. tra3 ◴[26 Sep 25 16:51 UTC] No.45388565{4}[source]▶

>>45388460 #

Recursion and memoization only as a general approach to solving "large" problems.

I really want to paraphrase kernighan's law as applied to LLMs. "If you use your whole context window to code a solution to a problem, how are you going to debug it?".

replies(1): >>45388913 #

9. tptacek ◴[26 Sep 25 17:28 UTC] No.45388913{5}[source]▶

>>45388565 #

By checkpointing once the agent loop has decided it's ready to hand off a solution, generating a structured summary of all the prior elements in the context, writing that to a file, and then marking all those prior context elements as dead so they don't occupy context window space.

Look carefully at a context window after solving a large problem, and I think in most cases you'll see even the 90th percentile token --- to say nothing of the median --- isn't valuable.

However large we're allowing frontier model context windows to get, we've got integer multiple more semantic space to allocate if we're even just a little bit smart about managing that resource. And again, this is assuming you don't recurse or divide the problem into multiple context windows.

10. libraryofbabel ◴[26 Sep 25 20:04 UTC] No.45390488{3}[source]▶

>>45388468 #

And even if you do use Claude for actual work, there is also immense pedagogical value in writing an agent from scratch. Something really clicks when you actually write the LLM + tool calls loop yourself. I ran a workshop on this at my company and we wrote a basic CLI agent in only 120 lines of Python, with just three tools: list files, read file, and (over)write file. (At that point, the agent becomes capable enough that you can set it to modifying itself and ask it to add more tools!) I think it was an eye-opener for a lot of people to see what the core of these things looks like. There is no magic dust in the agent; it's all in the LLM black box.

I hadn't considered actually rolling my own for day-to-day use, but now maybe I will. Although it's worth noting that Claude Code Hooks do give you the ability to insert your own code into the LLM loop - though not to the point of Eternal Sunshining your context, it's true.

replies(1): >>45400445 #

11. JambalayaJimbo ◴[28 Sep 25 00:01 UTC] No.45400445{4}[source]▶

>>45390488 #

Do you have this workshop available online? I’m really struggling to understand what “tool calls” and MCP are!

↑