←back to thread

196 points zmccormick7 | 10 comments | | HN request time: 0.217s | source | bottom
Show context
aliljet ◴[] No.45387614[source]
There's a misunderstanding here broadly. Context could be infinite, but the real bottleneck is understanding intent late in a multi-step operation. A human can effectively discard or disregard prior information as the narrow window of focus moves to a new task, LLMs seem incredibly bad at this.

Having more context, but leaving open an inability to effectively focus on the latest task is the real problem.

replies(10): >>45387639 #>>45387672 #>>45387700 #>>45387992 #>>45388228 #>>45388271 #>>45388664 #>>45388965 #>>45389266 #>>45404093 #
1. ray__ ◴[] No.45387639[source]
This is a great insight. Any thoughts on how to address this problem?
replies(3): >>45387751 #>>45387782 #>>45387912 #
2. throwup238 ◴[] No.45387751[source]
It has to be addressed architecturally with some sort of extension to transformers that can focus the attention on just the relevant context.

People have tried to expand context windows by reducing the O(n^2) attention mechanism to something more sparse and it tends to perform very poorly. It will take a fundamental architectural change.

replies(3): >>45387795 #>>45387930 #>>45388296 #
3. aliljet ◴[] No.45387782[source]
For me? It's simple. Completely empty the context and rebuild focused on the new task at hand. It's painful, but very effective.
replies(1): >>45387836 #
4. buddhistdude ◴[] No.45387795[source]
Can one instruct an LLM to pick the parts of the context that will be relevant going forward? And then discard the existing context, replacing it with the new 'summary'?
5. ◴[] No.45387836[source]
6. atonse ◴[] No.45387912[source]
Do we know if LLMs understand the concept of time? (like i told you this in the past, but what i told you later should supersede it?)

I know there classes of problems that LLMs can't natively handle (like doing math, even simple addition... or spatial reasoning, I would assume time's in there too). There are ways they can hack around this, like writing code that performs the math.

But how would you do that for chronological reasoning? Because that would help with compacting context to know what to remember and what not.

replies(2): >>45388155 #>>45389376 #
7. magicalhippo ◴[] No.45387930[source]
I'm not an expert but it seemed fairly reasonable to me that a hierarchical model would be needed to approach what humans can do, as that's basically how we process data as well.

That is, humans usually don't store exactly what was written in as sentence five paragraphs ago, but rather the concept or idea conveyed. If we need details we go back and reread or similar.

And when we write or talk, we form first an overall thought about what to say, then we break it into pieces and order the pieces somewhat logically, before finally forming words that make up sentences for each piece.

From what I can see there's work on this, like this[1] and this[2] more recent paper. Again not an expert so can't comment on the quality of the references, just some I found.

[1]: https://aclanthology.org/2022.findings-naacl.117/

[2]: https://aclanthology.org/2025.naacl-long.410/

8. loudmax ◴[] No.45388155[source]
LLMs certainly don't experience time like we do. They live in a uni-dimensional world that consists of a series of tokens (though it gets more nuanced if you account for multi-modal or diffusion models). They pick up some sense of ordering from their training data, such as "disregard my previous instruction," but it's not something they necessarily understand intuitively. Fundamentally, they're just following whatever patterns happen to be in their training data.
9. yggdrasil_ai ◴[] No.45388296[source]
>extension to transformers that can focus the attention on just the relevant context.

That is what transformers attention does in the first place, so you would just be stacking two transformers.

10. sebastiennight ◴[] No.45389376[source]
All it sees is a big blob of text, some of which can be structured to differentiate turns between "assistant", "user", "developer" and "system".

In theory you could attach metadata (with timestamps) to these turns, or include the timestamp in the text.

It does not affect much, other than giving the possibility for the model to make some inferences (eg. that previous message was on a different date, so its "today" is not the same "today" as in the latest message).

To chronologically fade away the importance of a conversation turn, you would need to either add more metadata (weak), progressively compact old turns (unreliable) or post-train a model to favor more recent areas of the context.