Most active commenters

tptacek(5)
neutronicus(4)
PantaloonFlames(4)
Someone1234(3)

Popular/hot comments

>>45387838 #
>>45387700 #
>>45387672 #
>>45387639 #
>>45387751 #
>>45387769 #
>>45388047 #

←back to thread

Context is the bottleneck for coding agents now

(runnercode.com)

1. aliljet ◴[26 Sep 25 15:27 UTC] No.45387614[source]▶

>>45387374 (OP) #

There's a misunderstanding here broadly. Context could be infinite, but the real bottleneck is understanding intent late in a multi-step operation. A human can effectively discard or disregard prior information as the narrow window of focus moves to a new task, LLMs seem incredibly bad at this.

Having more context, but leaving open an inability to effectively focus on the latest task is the real problem.

replies(10): >>45387639 #>>45387672 #>>45387700 #>>45387992 #>>45388228 #>>45388271 #>>45388664 #>>45388965 #>>45389266 #>>45404093 #

2. ray__ ◴[26 Sep 25 15:29 UTC] No.45387639[source]▶

>>45387614 (TP) #

This is a great insight. Any thoughts on how to address this problem?

replies(3): >>45387751 #>>45387782 #>>45387912 #

3. neutronicus ◴[26 Sep 25 15:32 UTC] No.45387672[source]▶

>>45387614 (TP) #

No, I think context itself is still an issue.

Coding agents choke on our big C++ code-base pretty spectacularly if asked to reference large files.

replies(4): >>45387769 #>>45388023 #>>45388024 #>>45388311 #

4. bgirard ◴[26 Sep 25 15:34 UTC] No.45387700[source]▶

>>45387614 (TP) #

I think that's the real issue. If the LLM spends a lot of context investigating a bad solution and you redirect it, I notice it has trouble ignoring maybe 10K tokens of bad exploration context against my 10 line of 'No, don't do X, explore Y' instead.

replies(6): >>45387838 #>>45387902 #>>45388477 #>>45390299 #>>45390619 #>>45394242 #

5. throwup238 ◴[26 Sep 25 15:40 UTC] No.45387751[source]▶

>>45387639 #

It has to be addressed architecturally with some sort of extension to transformers that can focus the attention on just the relevant context.

People have tried to expand context windows by reducing the O(n^2) attention mechanism to something more sparse and it tends to perform very poorly. It will take a fundamental architectural change.

replies(3): >>45387795 #>>45387930 #>>45388296 #

6. Someone1234 ◴[26 Sep 25 15:41 UTC] No.45387769[source]▶

>>45387672 #

Yeah, I have the same issue too. Even for a file with several thousand lines, they will "forget" earlier parts of the file they're still working in resulting in mistakes. They don't need full awareness of the context, but they need a summary of it so that they can go back and review relevant sections.

I have multiple things I'd love LLMs to attempt to do, but the context window is stopping me.

replies(3): >>45388022 #>>45388231 #>>45390936 #

7. aliljet ◴[26 Sep 25 15:42 UTC] No.45387782[source]▶

>>45387639 #

For me? It's simple. Completely empty the context and rebuild focused on the new task at hand. It's painful, but very effective.

replies(1): >>45387836 #

8. buddhistdude ◴[26 Sep 25 15:44 UTC] No.45387795{3}[source]▶

>>45387751 #

Can one instruct an LLM to pick the parts of the context that will be relevant going forward? And then discard the existing context, replacing it with the new 'summary'?

9. ◴[26 Sep 25 15:47 UTC] No.45387836{3}[source]▶

>>45387782 #

10. dingnuts ◴[26 Sep 25 15:47 UTC] No.45387838[source]▶

>>45387700 #

that's because a next token predictor can't "forget" context. That's just not how it works.

You load the thing up with relevant context and pray that it guides the generation path to the part of the model that represents the information you want and pray that the path of tokens through the model outputs what you want

That's why they have a tendency to go ahead and do things you tell them not to do..

also IDK about you but I hate how much praying has become part of the state of the art here. I didn't get into this career to be a fucking tech priest for the machine god. I will never like these models until they are predictable, which means I will never like them.

replies(8): >>45387906 #>>45387974 #>>45387999 #>>45388198 #>>45388215 #>>45388542 #>>45388863 #>>45390695 #

11. rco8786 ◴[26 Sep 25 15:51 UTC] No.45387902[source]▶

>>45387700 #

I think the general term for this is "context poisoning" and is related but slightly different to what the poster above you is saying. Even with a "perfect" context, the LLM still can't infer intent.

12. victorbjorklund ◴[26 Sep 25 15:51 UTC] No.45387906{3}[source]▶

>>45387838 #

You can rewrite the history (but there are issues with that too). So an agent can forget context. Simply dont feed in part of the context on the next run.

13. atonse ◴[26 Sep 25 15:52 UTC] No.45387912[source]▶

>>45387639 #

Do we know if LLMs understand the concept of time? (like i told you this in the past, but what i told you later should supersede it?)

I know there classes of problems that LLMs can't natively handle (like doing math, even simple addition... or spatial reasoning, I would assume time's in there too). There are ways they can hack around this, like writing code that performs the math.

But how would you do that for chronological reasoning? Because that would help with compacting context to know what to remember and what not.

replies(2): >>45388155 #>>45389376 #

14. magicalhippo ◴[26 Sep 25 15:54 UTC] No.45387930{3}[source]▶

>>45387751 #

I'm not an expert but it seemed fairly reasonable to me that a hierarchical model would be needed to approach what humans can do, as that's basically how we process data as well.

That is, humans usually don't store exactly what was written in as sentence five paragraphs ago, but rather the concept or idea conveyed. If we need details we go back and reread or similar.

And when we write or talk, we form first an overall thought about what to say, then we break it into pieces and order the pieces somewhat logically, before finally forming words that make up sentences for each piece.

From what I can see there's work on this, like this[1] and this[2] more recent paper. Again not an expert so can't comment on the quality of the references, just some I found.

[1]: https://aclanthology.org/2022.findings-naacl.117/

[2]: https://aclanthology.org/2025.naacl-long.410/

15. dragonwriter ◴[26 Sep 25 15:57 UTC] No.45387974{3}[source]▶

>>45387838 #

This is where the distinction between “an LLM” and “a user-facing system backed by an LLM” becomes important; the latter is often much more than a naive system for maintaining history and reprompting the LLM with added context from new user input, and could absolutely incorporate a step which (using the same LLM with different prompting or completely different tooling) edited the context before presenting it to the LLM to generate the response to the user. And such a system could, by that mechanism, “forget” selected context in the process.

replies(2): >>45388257 #>>45388827 #

16. davedx ◴[26 Sep 25 15:59 UTC] No.45387999{3}[source]▶

>>45387838 #

Yeah I start a new session to mitigate this. Don’t keep hammering away - close the current chat/session whatever and restate the problem carefully in a new one.

replies(2): >>45388047 #>>45388661 #

17. AnotherGoodName ◴[26 Sep 25 16:01 UTC] No.45388022{3}[source]▶

>>45387769 #

I do take that as a sign to refactor when it happens though. Even if not for the sake of LLM compatibility with the codebase it cuts down merge conflicts to refactor large files.

In fact I've found LLMs are reasonable at the simple task of refactoring a large file into smaller components with documentation on what each portion does even if they can't get the full context immediately. Doing this then helps the LLM later. I'm also of the opinion we should be making codebases LLM compatible. So if it happens i direct the LLM that way for 10mins and then get back to the actual task once the codebase is in a more reasonable state.

replies(1): >>45388498 #

18. atonse ◴[26 Sep 25 16:02 UTC] No.45388024[source]▶

>>45387672 #

I've found situations where a file was too big, and then it tries to grep for what might be useful in that file.

I could see in C++ it getting smarter about first checking the .h files or just grepping for function documentation, before actually trying to pull out parts of the file.

replies(1): >>45389593 #

19. cjbgkagh ◴[26 Sep 25 16:03 UTC] No.45388047{4}[source]▶

>>45387999 #

There should be a simple button that allows you refine the context. A fresh LLM could generate a new context from the input and outputs of the chat history, then another fresh LLM can start over with that context.

replies(3): >>45388179 #>>45388238 #>>45388840 #

20. loudmax ◴[26 Sep 25 16:16 UTC] No.45388155{3}[source]▶

>>45387912 #

LLMs certainly don't experience time like we do. They live in a uni-dimensional world that consists of a series of tokens (though it gets more nuanced if you account for multi-modal or diffusion models). They pick up some sense of ordering from their training data, such as "disregard my previous instruction," but it's not something they necessarily understand intuitively. Fundamentally, they're just following whatever patterns happen to be in their training data.

21. adastra22 ◴[26 Sep 25 16:17 UTC] No.45388179{5}[source]▶

>>45388047 #

/compact in Claude Code.

22. moffkalast ◴[26 Sep 25 16:20 UTC] No.45388198{3}[source]▶

>>45387838 #

That's not how attention works though, it should be perfectly able to figure out which parts are important and which aren't, but the problem is that it doesn't really scale beyond small contexts and works on a token to token basis instead of being hierarchical with sentences, paragraphs and sections. The only models that actually do long context do so by skipping attention layers or doing something without attention or without positional encodings, all leading to shit performance. Nobody pretrains on more than like 8k, except maybe Google who can throw TPUs at the problem.

23. jofla_net ◴[26 Sep 25 16:21 UTC] No.45388215{3}[source]▶

>>45387838 #

Relax friend! I can't see why you'd be peeved in the slightest! Remember, the CEOs have it all figured out and have 'determined' that we don't need all those eyeballs on the code anymore. You can simply 'feed' the machine and do the work of forty devs! This is the new engineering! /s

24. sheerun ◴[26 Sep 25 16:22 UTC] No.45388228[source]▶

>>45387614 (TP) #

Could be, but it's not. As soon as it will be infinite new brand of solutions will emerge

25. bongodongobob ◴[26 Sep 25 16:22 UTC] No.45388231{3}[source]▶

>>45387769 #

Interestingly, this issue has caused me to refactor and modularize code that I should have addressed a long time ago, but didn't have the time or stamina to tackle. Because the LLM can't handle the context, it has helped me refactor stuff (seems to be very good at this in my experience) and that has led me to write cleaner and more modular code that the LLMs can better handle.

26. pulvinar ◴[26 Sep 25 16:23 UTC] No.45388238{5}[source]▶

>>45388047 #

It's easy to miss: ChatGPT now has a "branch to new chat" option to branch off from any reply.

27. yggdrasil_ai ◴[26 Sep 25 16:24 UTC] No.45388257{4}[source]▶

>>45387974 #

I have been building Yggdrasil for that exact purpose - https://github.com/zayr0-9/Yggdrasil

28. tptacek ◴[26 Sep 25 16:25 UTC] No.45388271[source]▶

>>45387614 (TP) #

Asking, not arguing, but: why can't they? You can give an agent access to its own context and ask it to lobotomize itself like Eternal Sunshine. I just did that with a log ingestion agent (broad search to get the lay of the land, which eats a huge chunk of the context window, then narrow searches for weird stuff it spots, then go back and zap the big log search). I assume this is a normal approach, since someone else suggested it to me.

replies(2): >>45388348 #>>45388456 #

29. yggdrasil_ai ◴[26 Sep 25 16:27 UTC] No.45388296{3}[source]▶

>>45387751 #

>extension to transformers that can focus the attention on just the relevant context.

That is what transformers attention does in the first place, so you would just be stacking two transformers.

30. AlGoreRhythm ◴[26 Sep 25 16:28 UTC] No.45388311[source]▶

>>45387672 #

Out of curiosity, how would you rate an LLM’s ability to deal with pointers in C++ code?

replies(2): >>45389012 #>>45389408 #

31. simonw ◴[26 Sep 25 16:31 UTC] No.45388348[source]▶

>>45388271 #

This is also the idea behind sub-agents. Claude Code answers questions about things like "where is the code that does X" by firing up a separate LLM running in a fresh context, posing it the question and having it answer back when it finds the answer. https://simonwillison.net/2025/Jun/2/claude-trace/

replies(2): >>45388378 #>>45388417 #

32. tptacek ◴[26 Sep 25 16:35 UTC] No.45388378{3}[source]▶

>>45388348 #

I'm playing with that too (everyone should write an agent; basic sub-agents are incredibly simple --- just tool calls that can make their own LLM calls, or even just a tool call that runs in its own context window). What I like about Eternal Sunshine is that the LLM can just make decisions about what context stuff matters and what doesn't, which is a problem that comes up a lot when you're looking at telemetry data.

33. tra3 ◴[26 Sep 25 16:38 UTC] No.45388417{3}[source]▶

>>45388348 #

I keep wondering if we're forgetting the fundamentals:

> Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it?

https://www.laws-of-software.com/laws/kernighan/

Sure, you eat the elephant one bite at a time, and recursion is a thing but I wonder where the tipping point here is.

replies(1): >>45388460 #

34. libraryofbabel ◴[26 Sep 25 16:41 UTC] No.45388456[source]▶

>>45388271 #

Yes! - and I wish this was easier to do with common coding agents like Claude Code. Currently you can kind of do it manually by copying the results of the context-busting search, rewinding history (Esc Esc) to remove the now-useless stuff, and then dropping in the results.

Of course, subagents are a good solution here, as another poster already pointed out. But it would be nice to have something more lightweight and automated, maybe just turning on a mode where the LLM is asked to throw things out according to its own judgement, if you know you're going to be doing work with a lot of context pollution.

replies(1): >>45388468 #

35. tptacek ◴[26 Sep 25 16:41 UTC] No.45388460{4}[source]▶

>>45388417 #

I think recursion is the wrong way to look at this, for what it's worth.

replies(1): >>45388565 #

36. tptacek ◴[26 Sep 25 16:42 UTC] No.45388468{3}[source]▶

>>45388456 #

This is why I'm writing my own agent code instead of using simonw's excellent tools or just using Claude; the most interesting decisions are in the structure of the LLM loop itself, not in how many random tools I can plug into it. It's an unbelievably small amount of code to get to the point of super-useful results; maybe like 1500 lines, including a TUI.

replies(1): >>45390488 #

37. cma ◴[26 Sep 25 16:43 UTC] No.45388477[source]▶

>>45387700 #

Not that this shouldn't be fixed in the model, but you can jump to an earlier point in claude code and on web chat interfaces to get it out of the context, just sometimes you have other important stuff you don't want it to lose.

replies(2): >>45388813 #>>45394269 #

38. Someone1234 ◴[26 Sep 25 16:44 UTC] No.45388498{4}[source]▶

>>45388022 #

I'm trying to use LLMs to save me time and resources, "refactor your entire codebase, so the tool can work" is the opposite of that. Regardless of how you rationalize it.

replies(1): >>45388933 #

39. keeda ◴[26 Sep 25 16:48 UTC] No.45388542{3}[source]▶

>>45387838 #

> I didn't get into this career to be a fucking tech priest for the machine god.

You may appreciate this illustration I made (largely with AI, of course): https://imgur.com/a/0QV5mkS

The context (heheheh) is a long-ass article on coding with AI I wrote eons ago that nobody ever read, if anybody is curious: https://news.ycombinator.com/item?id=40443374

Looking back at it, I was off on a few predictions but a number of them are coming true.

40. tra3 ◴[26 Sep 25 16:51 UTC] No.45388565{5}[source]▶

>>45388460 #

Recursion and memoization only as a general approach to solving "large" problems.

I really want to paraphrase kernighan's law as applied to LLMs. "If you use your whole context window to code a solution to a problem, how are you going to debug it?".

replies(1): >>45388913 #

41. sethhochberg ◴[26 Sep 25 16:59 UTC] No.45388661{4}[source]▶

>>45387999 #

I've had great luck with asking the current session to "summarize our goals, conversation, and other relevant details like git commits to this point in a compact but technically precise way that lets a new LLM pick up where we're leaving off".

The new session throws away whatever behind-the-scenes context was causing problems, but the prepared prompt gets the new session up and running more quickly especially if picking up in the middle of a piece of work that's already in progress.

replies(1): >>45388986 #

42. raincole ◴[26 Sep 25 16:59 UTC] No.45388664[source]▶

>>45387614 (TP) #

Humans have a very strong tendency (and have made tremendous collective efforts) to compress context. I'm not a neuroscientist but I believe it's called "chunk."

Language itself is a highly compressed form of compressed context. Like when you read "hoist with one's own petard" you don't just think about literal petard but the context behind this phrase.

replies(1): >>45388852 #

43. PantaloonFlames ◴[26 Sep 25 17:17 UTC] No.45388813{3}[source]▶

>>45388477 #

Likewise Gemini CLI. There’s a way to backup to a prior state in the dialogue.

44. PantaloonFlames ◴[26 Sep 25 17:18 UTC] No.45388827{4}[source]▶

>>45387974 #

At least a few of the current coding agents have mechanisms that do what you describe.

45. PantaloonFlames ◴[26 Sep 25 17:20 UTC] No.45388840{5}[source]▶

>>45388047 #

You are saying “fresh LLM” but really I think you’re referring to a curated context. The existing coding agents have mechanisms to do this. Saving context to a file. Editing the file. Clearing all context except for the file. It’s sort of clunky now but it will get better and slicker.

replies(1): >>45389185 #

46. PantaloonFlames ◴[26 Sep 25 17:21 UTC] No.45388852[source]▶

>>45388664 #

We don’t think of petards because no one knows what that is. :)

replies(1): >>45390018 #

47. spyder ◴[26 Sep 25 17:23 UTC] No.45388863{3}[source]▶

>>45387838 #

This is false:

"that's because a next token predictor can't "forget" context. That's just not how it works."

An LSTM is also a next token predictor and literally have a forget gate, and there are many other context compressing models too which remember only the what it thinks is important and forgets the less important, like for example: state-space models or RWKV that work well as LLMs too. But even just a the basic GPT model forgets old context since it's gets truncated if it cannot fit, but that's not really the learned smart forgetting the other models do.

48. tptacek ◴[26 Sep 25 17:28 UTC] No.45388913{6}[source]▶

>>45388565 #

By checkpointing once the agent loop has decided it's ready to hand off a solution, generating a structured summary of all the prior elements in the context, writing that to a file, and then marking all those prior context elements as dead so they don't occupy context window space.

Look carefully at a context window after solving a large problem, and I think in most cases you'll see even the 90th percentile token --- to say nothing of the median --- isn't valuable.

However large we're allowing frontier model context windows to get, we've got integer multiple more semantic space to allocate if we're even just a little bit smart about managing that resource. And again, this is assuming you don't recurse or divide the problem into multiple context windows.

49. thunky ◴[26 Sep 25 17:30 UTC] No.45388933{5}[source]▶

>>45388498 #

It may be a good idea to refactor even if not for LLMs but for humans sake.

replies(1): >>45389095 #

50. sdesol ◴[26 Sep 25 17:32 UTC] No.45388965[source]▶

>>45387614 (TP) #

> A human can effectively discard or disregard prior information as the narrow window of focus moves to a new task, LLMs seem incredibly bad at this.

This is how I designed my LLM chat app (https://github.com/gitsense/chat). I think agents have their place, but I really think if you want to solve complex problems without needlessly burning tokens, you will need a human in the loop to curate the context. I will get to it, but I believe in the same way that we developed different flows for working with Git, we will have different 'Chat Flows' for working with LLMs.

I have an interactive demo at https://chat.gitsense.com which shows how you can narrow the focus of the context for the LLM. Click "Start GitSense Chat Demos" then "Context Engineering & Management" to go through the 30 second demo.

51. DenisM ◴[26 Sep 25 17:34 UTC] No.45388986{5}[source]▶

>>45388661 #

Wow, I had useless results asking “please summarize important points of the discussion” from ChatGPT. It just doesn’t understand what’s important, and instead of highlighting pivoting moments of the conversation it produce a high level introduction for a non-practitioner.

Can you share you prompt?

replies(1): >>45389913 #

52. neutronicus ◴[26 Sep 25 17:36 UTC] No.45389012{3}[source]▶

>>45388311 #

Greenfield project? Claude is fucking great at C++. Almost all aspects of it, really.

Well, not so much the project organization stuff - it wants to stuff everything into one header and has to be browbeaten into keeping implementations out of headers.

But language semantics? It's pretty great at those. And when it screws up it's also really good at interpreting compiler error messages.

53. Someone1234 ◴[26 Sep 25 17:44 UTC] No.45389095{6}[source]▶

>>45388933 #

Right, but the discussion we're having here is context size. I, and others, are saying that the current context size is a limitation on when they can use the tool to be useful.

The replies of "well, just change the situation, so context doesn't matter" is irrelevant, and off-topic. The rationalizations even more so.

replies(1): >>45390931 #

54. cjbgkagh ◴[26 Sep 25 17:52 UTC] No.45389185{6}[source]▶

>>45388840 #

It seems that I have missed this existing feature, I’m only a light user of LLMs, I’ll keep an eye out for it.

replies(1): >>45391924 #

55. notatoad ◴[26 Sep 25 18:01 UTC] No.45389266[source]▶

>>45387614 (TP) #

i think that's really just a misunderstanding of what "bottleneck" means. a bottleneck isn't an obstacle where overcoming it will allow you to realize unlimited potential, a bottleneck is always just an obstacle to finding the next constraint.

on actual bottles without any metaphors, the bottle neck is narrower because humans mouths are narrower.

56. sebastiennight ◴[26 Sep 25 18:13 UTC] No.45389376{3}[source]▶

>>45387912 #

All it sees is a big blob of text, some of which can be structured to differentiate turns between "assistant", "user", "developer" and "system".

In theory you could attach metadata (with timestamps) to these turns, or include the timestamp in the text.

It does not affect much, other than giving the possibility for the model to make some inferences (eg. that previous message was on a different date, so its "today" is not the same "today" as in the latest message).

To chronologically fade away the importance of a conversation turn, you would need to either add more metadata (weak), progressively compact old turns (unreliable) or post-train a model to favor more recent areas of the context.

57. jdrek1 ◴[26 Sep 25 18:16 UTC] No.45389408{3}[source]▶

>>45388311 #

If you have lots of pointers, you're writing C, not C++.

replies(1): >>45389622 #

58. neutronicus ◴[26 Sep 25 18:33 UTC] No.45389593{3}[source]▶

>>45388024 #

Yeah, my first instinct has been to expose an LSP server as a tool so the LLM can avoid reading entire 40,000 line files just to get the implementation of one function.

I think with appropriate instructions in the system prompt it could probably work on this code-base more like I do (heavy use of Ctrl-, in Visual Studio to jump around and read only relevant portions of the code-base).

59. neutronicus ◴[26 Sep 25 18:35 UTC] No.45389622{4}[source]▶

>>45389408 #

Eh, it's a big tent

60. sethhochberg ◴[26 Sep 25 19:06 UTC] No.45389913{6}[source]▶

>>45388986 #

Honestly, I just type out something by hand that is roughly like what I quoted above - I'm not big on keeping prompt libraries.

I think the important part is to give it (in my case, these days "it" is gpt-5-codex) a target persona, just like giving it a specific problem instead of asking it to be clever or creative. I've never asked it for a summary of a long conversation without the context of why I want the summary and who the intended audience is, but I have to imagine that helps it frame its output.

61. KineticLensman ◴[26 Sep 25 19:17 UTC] No.45390018{3}[source]▶

>>45388852 #

For anyone wondering, it means blown into the air (‘hoist’) by your own bomb (‘petard’). From Shakespeare

62. ericmcer ◴[26 Sep 25 19:44 UTC] No.45390299[source]▶

>>45387700 #

It seems possible for openAI/Anthropic to rework their tools so they discard/add relevant context on the fly, but it might have some unintended behaviors.

The main thing is people have already integrated AI into their workflows so the "right" way for the LLM to work is the way people expect it to. For now I expect to start multiple fresh contexts while solving a single problem until I can setup a context that gets the result I want. Changing this behavior might mess me up.

replies(2): >>45391352 #>>45391400 #

63. libraryofbabel ◴[26 Sep 25 20:04 UTC] No.45390488{4}[source]▶

>>45388468 #

And even if you do use Claude for actual work, there is also immense pedagogical value in writing an agent from scratch. Something really clicks when you actually write the LLM + tool calls loop yourself. I ran a workshop on this at my company and we wrote a basic CLI agent in only 120 lines of Python, with just three tools: list files, read file, and (over)write file. (At that point, the agent becomes capable enough that you can set it to modifying itself and ask it to add more tools!) I think it was an eye-opener for a lot of people to see what the core of these things looks like. There is no magic dust in the agent; it's all in the LLM black box.

I hadn't considered actually rolling my own for day-to-day use, but now maybe I will. Although it's worth noting that Claude Code Hooks do give you the ability to insert your own code into the LLM loop - though not to the point of Eternal Sunshining your context, it's true.

replies(1): >>45400445 #

64. reissbaker ◴[26 Sep 25 20:17 UTC] No.45390619[source]▶

>>45387700 #

IMO specifically OpenAI's models are really bad at being steered once they've decided to do something dumb. Claude and OSS models tend to take feedback better.

GPT-5 is brilliant when it oneshots the right direction from the beginning, but pretty unmanageable when it goes off the rails.

65. Mikhail_Edoshin ◴[26 Sep 25 20:27 UTC] No.45390695{3}[source]▶

>>45387838 #

Well, "a sufficiently advanced technology is indistinguishable from magic". It's just that it is same in a bad way, not a good way.

66. thunky ◴[26 Sep 25 20:54 UTC] No.45390931{7}[source]▶

>>45389095 #

A huge context is a problem for humans too, which is why I think it's fair to suggest maybe the tool isn't the (only) problem.

Tools like Aider create a code map that basically indexes code into a small context. Which I think is similar to what we humans do when we try to understand a large codebase.

I'm not sure if Aider can then load only portions of a huge file on demand, but it seems like that should work pretty well.

replies(1): >>45395565 #

67. hadlock ◴[26 Sep 25 20:55 UTC] No.45390936{3}[source]▶

>>45387769 #

I've started getting in the habit of finding seams in files > 1500 lines long. Occasionally it is unavoidable, but very regularly.

68. sshine ◴[26 Sep 25 21:48 UTC] No.45391352{3}[source]▶

>>45390299 #

> rework their tools so they discard/add relevant context on the fly

That may be the foundation for an innovation step in model providers. But you can achieve a poor man’s simulation if you can determine, in retrospect, when a context was at peak for taking turns, and when it got too rigid, or too many tokens were spent, and then simply replay the context up until that point.

I don’t know if evaluating when a context is worth duplicating is a thing; it’s not deterministic, and it depends on enforcing a certain workflow.

69. vel0city ◴[26 Sep 25 21:54 UTC] No.45391400{3}[source]▶

>>45390299 #

A number of agentic coding tools do this. Upon an initial request for a larger set of actions, it will write a markdown file with its "thoughts" on its plan to do something, and keep notes as it goes. They'll then automatically compact their contexts and re-read their notes to keep "focused" while still having a bit of insight on what it did previously and what the original ask was.

replies(2): >>45391466 #>>45404113 #

70. cvzakharchenko ◴[26 Sep 25 22:02 UTC] No.45391466{4}[source]▶

>>45391400 #

Interesting. I know people do this manually. But are there agentic coding tools that actually automate this approach?

replies(2): >>45391876 #>>45392793 #

71. sshine ◴[26 Sep 25 22:58 UTC] No.45391876{5}[source]▶

>>45391466 #

Claude Code has /init and /compact that do this. It doesn’t recreate the context as-is, but creates a context that is presumed to be functionally equivalent. I find that’s not the case and that building up from very little stored context and a lot of specialised dialogue works better.

72. fzzzy ◴[26 Sep 25 23:04 UTC] No.45391924{7}[source]▶

>>45389185 #

some sibling comments mentioned Claude code has this

73. vel0city ◴[27 Sep 25 02:16 UTC] No.45392793{5}[source]▶

>>45391466 #

I've seen this behavior with Cursor, Windsurf, and Amazon Q. It normally only does it for very large requests from what I've seen.

74. Nerdx86 ◴[27 Sep 25 09:01 UTC] No.45394242[source]▶

>>45387700 #

So this is where having subagents fed specific curated context is a help.. As long as the "poisoned" agent can focus long enough to generate a clean request to the subagent, the subagent works posion-free. This is much more likely than a single agent setup with the token by token process of a transformer.

The same protection works in reverse, if a subagent goes off the rails and either self aborts or is aborted, that large context is truncated to the abort response which is "salted" with the fact that this was stopped. Even if the subagent goes sideways and still returns success (Say separate dev, review, and test subagents) the main agent has another opportunity to compare the response and the product against the main context or to instruct a subagent to do it in a isolated context..

Not perfect at all, but better than a single context.

One other thing, there is some consensus that "don't" "not" "never" are not always functional in context. And that is a big problem. Anecdotally and experimental, many (including myself) have seen the agent diligently performing the exact thing following a "never" once it gets far enough back in the context. Even when it's a less common action.

75. Nerdx86 ◴[27 Sep 25 09:06 UTC] No.45394269{3}[source]▶

>>45388477 #

The other issue with this is that if you jump back and it has edited code, it loses the context of those edits.. It may have previous versions of the code in memory and no knowledge of the edits leading to other edits that no longer align.. Often it's better to just /clear.. :/

76. KronisLV ◴[27 Sep 25 13:30 UTC] No.45395565{8}[source]▶

>>45390931 #

As someone who's worked with both more fragmented/modular codebases with smaller classes and shorter files vs ones that span thousands of lines (sometimes even double digits), I very much prefer the former and hate the latter.

That said, some of the models out there (Gemini 2.5 Pro, for example) support 1M context; it's just going to be expensive and will still probably confuse the model somewhat when it comes to the output.

77. JambalayaJimbo ◴[28 Sep 25 00:01 UTC] No.45400445{5}[source]▶

>>45390488 #

Do you have this workshop available online? I’m really struggling to understand what “tool calls” and MCP are!

78. tom_m ◴[28 Sep 25 13:12 UTC] No.45404093[source]▶

>>45387614 (TP) #

You don't want to discard prior information though. That's the problem with small context windows. Humans don't forget the original request as they ask for more information or go about a long task. Humans may forget parts of information along the way, but not the original goal and important parts. Not unless they have comprehension issues or ADHD, etc.

This isn't a misconception. Context is a limitation. You can effectively have an AI agent build an entire application with a single prompt if it has enough (and the proper) context. The models with 1m context windows do better. Models with small context windows can't even do the task in many cases. I've tested this many, many, many times. It's tedious, but you can find the right model and the right prompts for success.

79. tom_m ◴[28 Sep 25 13:14 UTC] No.45404113{4}[source]▶

>>45391400 #

This does help, yes. Todo lists are important. They also reinforce order of operations.

↑