Most active commenters

    ←back to thread

    196 points zmccormick7 | 20 comments | | HN request time: 0.001s | source | bottom
    Show context
    aliljet ◴[] No.45387614[source]
    There's a misunderstanding here broadly. Context could be infinite, but the real bottleneck is understanding intent late in a multi-step operation. A human can effectively discard or disregard prior information as the narrow window of focus moves to a new task, LLMs seem incredibly bad at this.

    Having more context, but leaving open an inability to effectively focus on the latest task is the real problem.

    replies(10): >>45387639 #>>45387672 #>>45387700 #>>45387992 #>>45388228 #>>45388271 #>>45388664 #>>45388965 #>>45389266 #>>45404093 #
    bgirard ◴[] No.45387700[source]
    I think that's the real issue. If the LLM spends a lot of context investigating a bad solution and you redirect it, I notice it has trouble ignoring maybe 10K tokens of bad exploration context against my 10 line of 'No, don't do X, explore Y' instead.
    replies(6): >>45387838 #>>45387902 #>>45388477 #>>45390299 #>>45390619 #>>45394242 #
    1. dingnuts ◴[] No.45387838[source]
    that's because a next token predictor can't "forget" context. That's just not how it works.

    You load the thing up with relevant context and pray that it guides the generation path to the part of the model that represents the information you want and pray that the path of tokens through the model outputs what you want

    That's why they have a tendency to go ahead and do things you tell them not to do..

    also IDK about you but I hate how much praying has become part of the state of the art here. I didn't get into this career to be a fucking tech priest for the machine god. I will never like these models until they are predictable, which means I will never like them.

    replies(8): >>45387906 #>>45387974 #>>45387999 #>>45388198 #>>45388215 #>>45388542 #>>45388863 #>>45390695 #
    2. victorbjorklund ◴[] No.45387906[source]
    You can rewrite the history (but there are issues with that too). So an agent can forget context. Simply dont feed in part of the context on the next run.
    3. dragonwriter ◴[] No.45387974[source]
    This is where the distinction between “an LLM” and “a user-facing system backed by an LLM” becomes important; the latter is often much more than a naive system for maintaining history and reprompting the LLM with added context from new user input, and could absolutely incorporate a step which (using the same LLM with different prompting or completely different tooling) edited the context before presenting it to the LLM to generate the response to the user. And such a system could, by that mechanism, “forget” selected context in the process.
    replies(2): >>45388257 #>>45388827 #
    4. davedx ◴[] No.45387999[source]
    Yeah I start a new session to mitigate this. Don’t keep hammering away - close the current chat/session whatever and restate the problem carefully in a new one.
    replies(2): >>45388047 #>>45388661 #
    5. cjbgkagh ◴[] No.45388047[source]
    There should be a simple button that allows you refine the context. A fresh LLM could generate a new context from the input and outputs of the chat history, then another fresh LLM can start over with that context.
    replies(3): >>45388179 #>>45388238 #>>45388840 #
    6. adastra22 ◴[] No.45388179{3}[source]
    /compact in Claude Code.
    7. moffkalast ◴[] No.45388198[source]
    That's not how attention works though, it should be perfectly able to figure out which parts are important and which aren't, but the problem is that it doesn't really scale beyond small contexts and works on a token to token basis instead of being hierarchical with sentences, paragraphs and sections. The only models that actually do long context do so by skipping attention layers or doing something without attention or without positional encodings, all leading to shit performance. Nobody pretrains on more than like 8k, except maybe Google who can throw TPUs at the problem.
    8. jofla_net ◴[] No.45388215[source]
    Relax friend! I can't see why you'd be peeved in the slightest! Remember, the CEOs have it all figured out and have 'determined' that we don't need all those eyeballs on the code anymore. You can simply 'feed' the machine and do the work of forty devs! This is the new engineering! /s
    9. pulvinar ◴[] No.45388238{3}[source]
    It's easy to miss: ChatGPT now has a "branch to new chat" option to branch off from any reply.
    10. yggdrasil_ai ◴[] No.45388257[source]
    I have been building Yggdrasil for that exact purpose - https://github.com/zayr0-9/Yggdrasil
    11. keeda ◴[] No.45388542[source]
    > I didn't get into this career to be a fucking tech priest for the machine god.

    You may appreciate this illustration I made (largely with AI, of course): https://imgur.com/a/0QV5mkS

    The context (heheheh) is a long-ass article on coding with AI I wrote eons ago that nobody ever read, if anybody is curious: https://news.ycombinator.com/item?id=40443374

    Looking back at it, I was off on a few predictions but a number of them are coming true.

    12. sethhochberg ◴[] No.45388661[source]
    I've had great luck with asking the current session to "summarize our goals, conversation, and other relevant details like git commits to this point in a compact but technically precise way that lets a new LLM pick up where we're leaving off".

    The new session throws away whatever behind-the-scenes context was causing problems, but the prepared prompt gets the new session up and running more quickly especially if picking up in the middle of a piece of work that's already in progress.

    replies(1): >>45388986 #
    13. PantaloonFlames ◴[] No.45388827[source]
    At least a few of the current coding agents have mechanisms that do what you describe.
    14. PantaloonFlames ◴[] No.45388840{3}[source]
    You are saying “fresh LLM” but really I think you’re referring to a curated context. The existing coding agents have mechanisms to do this. Saving context to a file. Editing the file. Clearing all context except for the file. It’s sort of clunky now but it will get better and slicker.
    replies(1): >>45389185 #
    15. spyder ◴[] No.45388863[source]
    This is false:

    "that's because a next token predictor can't "forget" context. That's just not how it works."

    An LSTM is also a next token predictor and literally have a forget gate, and there are many other context compressing models too which remember only the what it thinks is important and forgets the less important, like for example: state-space models or RWKV that work well as LLMs too. But even just a the basic GPT model forgets old context since it's gets truncated if it cannot fit, but that's not really the learned smart forgetting the other models do.

    16. DenisM ◴[] No.45388986{3}[source]
    Wow, I had useless results asking “please summarize important points of the discussion” from ChatGPT. It just doesn’t understand what’s important, and instead of highlighting pivoting moments of the conversation it produce a high level introduction for a non-practitioner.

    Can you share you prompt?

    replies(1): >>45389913 #
    17. cjbgkagh ◴[] No.45389185{4}[source]
    It seems that I have missed this existing feature, I’m only a light user of LLMs, I’ll keep an eye out for it.
    replies(1): >>45391924 #
    18. sethhochberg ◴[] No.45389913{4}[source]
    Honestly, I just type out something by hand that is roughly like what I quoted above - I'm not big on keeping prompt libraries.

    I think the important part is to give it (in my case, these days "it" is gpt-5-codex) a target persona, just like giving it a specific problem instead of asking it to be clever or creative. I've never asked it for a summary of a long conversation without the context of why I want the summary and who the intended audience is, but I have to imagine that helps it frame its output.

    19. Mikhail_Edoshin ◴[] No.45390695[source]
    Well, "a sufficiently advanced technology is indistinguishable from magic". It's just that it is same in a bad way, not a good way.
    20. fzzzy ◴[] No.45391924{5}[source]
    some sibling comments mentioned Claude code has this