Having more context, but leaving open an inability to effectively focus on the latest task is the real problem.
Having more context, but leaving open an inability to effectively focus on the latest task is the real problem.
You load the thing up with relevant context and pray that it guides the generation path to the part of the model that represents the information you want and pray that the path of tokens through the model outputs what you want
That's why they have a tendency to go ahead and do things you tell them not to do..
also IDK about you but I hate how much praying has become part of the state of the art here. I didn't get into this career to be a fucking tech priest for the machine god. I will never like these models until they are predictable, which means I will never like them.
You may appreciate this illustration I made (largely with AI, of course): https://imgur.com/a/0QV5mkS
The context (heheheh) is a long-ass article on coding with AI I wrote eons ago that nobody ever read, if anybody is curious: https://news.ycombinator.com/item?id=40443374
Looking back at it, I was off on a few predictions but a number of them are coming true.
The new session throws away whatever behind-the-scenes context was causing problems, but the prepared prompt gets the new session up and running more quickly especially if picking up in the middle of a piece of work that's already in progress.
"that's because a next token predictor can't "forget" context. That's just not how it works."
An LSTM is also a next token predictor and literally have a forget gate, and there are many other context compressing models too which remember only the what it thinks is important and forgets the less important, like for example: state-space models or RWKV that work well as LLMs too. But even just a the basic GPT model forgets old context since it's gets truncated if it cannot fit, but that's not really the learned smart forgetting the other models do.
Can you share you prompt?
I think the important part is to give it (in my case, these days "it" is gpt-5-codex) a target persona, just like giving it a specific problem instead of asking it to be clever or creative. I've never asked it for a summary of a long conversation without the context of why I want the summary and who the intended audience is, but I have to imagine that helps it frame its output.
The main thing is people have already integrated AI into their workflows so the "right" way for the LLM to work is the way people expect it to. For now I expect to start multiple fresh contexts while solving a single problem until I can setup a context that gets the result I want. Changing this behavior might mess me up.
GPT-5 is brilliant when it oneshots the right direction from the beginning, but pretty unmanageable when it goes off the rails.
That may be the foundation for an innovation step in model providers. But you can achieve a poor man’s simulation if you can determine, in retrospect, when a context was at peak for taking turns, and when it got too rigid, or too many tokens were spent, and then simply replay the context up until that point.
I don’t know if evaluating when a context is worth duplicating is a thing; it’s not deterministic, and it depends on enforcing a certain workflow.
The same protection works in reverse, if a subagent goes off the rails and either self aborts or is aborted, that large context is truncated to the abort response which is "salted" with the fact that this was stopped. Even if the subagent goes sideways and still returns success (Say separate dev, review, and test subagents) the main agent has another opportunity to compare the response and the product against the main context or to instruct a subagent to do it in a isolated context..
Not perfect at all, but better than a single context.
One other thing, there is some consensus that "don't" "not" "never" are not always functional in context. And that is a big problem. Anecdotally and experimental, many (including myself) have seen the agent diligently performing the exact thing following a "never" once it gets far enough back in the context. Even when it's a less common action.