Most active commenters

    ←back to thread

    419 points serjester | 14 comments | | HN request time: 1.252s | source | bottom
    Show context
    simonw ◴[] No.43535919[source]
    Yeah, the "book a flight" agent thing is a running joke now - it was a punchline in the Swyx keynote for the recent AI Engineer event in NYC: https://www.latent.space/p/agent

    I think this piece is underestimating the difficulty involved here though. If only it was as easy as "just pick a single task and make the agent really good at that"!

    The problem is that if your UI involves human beings typing or talking to you in a human language, there is an unbounded set of ways things could go wrong. You can't test against every possible variant of what they might say. Humans are bad at clearly expressing things, but even worse is the challenge of ensuring they have a concrete, accurate mental model of what the software can and cannot do.

    replies(12): >>43536068 #>>43536088 #>>43536142 #>>43536257 #>>43536583 #>>43536731 #>>43537089 #>>43537591 #>>43539058 #>>43539104 #>>43539116 #>>43540011 #
    emn13 ◴[] No.43536142[source]
    Perhaps the solutions(s) needs to be less focusing on output quality, and more on having a solid process for dealing with errors. Think undo, containers, git, CRDTs or whatever rather than zero tolerance for errors. That probably also means some kind of review for the irreversible bits of any process, and perhaps even process changes where possible to make common processes more reversible (which sounds like an extreme challenge in some cases).

    I can't imagine we're anywhere even close to the kind of perfection required not to need something like this - if it's even possible. Humans use all kinds of review and audit processes precisely because perfection is rarely attainable, and that might be fundamental.

    replies(6): >>43536235 #>>43536390 #>>43536448 #>>43536860 #>>43536868 #>>43538708 #
    1. _bin_ ◴[] No.43536868[source]
    The biggest issue I’ve seen is “context window poisoning”, for lack of a better term. If it screws something up it’s highly prone to repeating that mistake. It then makes a bad fix that propagates two more errors, the says, “Sure! Let me address that,” repeating to poorly fix those rather than the underlying issue (say, restructuring code to mitigate.)

    It is almost impossible to produce a useful result, far as I’ve seen, unless one eliminates that mistake from the context window.

    replies(4): >>43537158 #>>43537500 #>>43539768 #>>43547497 #
    2. instakill ◴[] No.43537158[source]
    I really really wish that LLMs had an "eject" function - as in I could click on any message in a chat, and it would basically start a new clone chat with the current chat's thread history.

    There are so many times where I get to a point where the conversation is finally flowing in the way that I want and I would love to "fork" into several directions from that one specific part of the conversation.

    Instead I have to rely on a prompt that requests the LLM to compress the entire conversation into a non-prose format that attempts to be as semantically lossless as possible; this sadly never works as in ten did [sic].

    replies(4): >>43537724 #>>43537745 #>>43539090 #>>43547811 #
    3. bongodongobob ◴[] No.43537500[source]
    I think this is one of the core issues people have when trying to program with them. If you have a long conversation with a bunch of edits, it will start to get unreliable. I frequently start new chats to get around this and it seems to work well for me.
    replies(1): >>43541024 #
    4. theblazehen ◴[] No.43537724[source]
    You can use LibreChat which allows you to fork messages: https://www.librechat.ai/docs/features/fork
    5. tough ◴[] No.43537745[source]
    Google UI supports branching and delete someone recently made a blog post about how great it is
    replies(1): >>43539049 #
    6. marlott ◴[] No.43539049{3}[source]
    which Google UI?
    replies(1): >>43542605 #
    7. mvdtnz ◴[] No.43539090[source]
    This is precisely what the poorly named Edit button does in Claude.
    8. donmcronald ◴[] No.43539768[source]
    This is what I find. If it makes a mistake, trying to get it to fix the mistake is futile and you can't "teach" it to avoid that mistake in the future.
    replies(1): >>43546068 #
    9. _bin_ ◴[] No.43541024[source]
    Yes, this definitely helps. It's just incredibly annoying because you have to dump context back into it, re-type stuff, consolidate stuff from the prior conversation, etc.
    replies(1): >>43542417 #
    10. dr_kiszonka ◴[] No.43542417{3}[source]
    Have the AI maintain a document (a local file or in canvas) with project goals, structure, setup instructions, current state, change log, todos, caveats, etc. You might need to remind it to keep it up-to-date, but I find this approach quite useful.
    11. tough ◴[] No.43542605{4}[source]
    ai.dev AI studio sorry
    12. johnisgood ◴[] No.43546068[source]
    It depends, I ran into this a lot with GPT, but less so with Claude.

    But then again, I know how it could avoid the mistake, so I point that out, from that point onwards it seems fine (in that chat).

    13. PeterStuer ◴[] No.43547497[source]
    "If it screws something up it’s highly prone to repeating that mistake"

    Certainly true, but coaching it past sometimes helps (not always).

    - roll back to the point before the mistake.

    - add instructions so as to avoid the same path. "Do not try X. We tried X it does not work as it leads to Y.

    - add resources that could aid a misunderstanding (api documentation, library code)

    - rerun the request (improve/reword with observed details or insights)

    I feel like some of the agentic frameworks are already including some of these heuristics, but a helping hand still can work to your benefit

    14. genewitch ◴[] No.43547811[source]
    LM studio has a fork button on every chat part. Sorry, can't think of a better word - you can fork on any human or ai part. You can also edit, but editing isn't, it essentially creates a copy of the context with the edit, and sends the whole thing to the AI. This can overflow your context window, so it isn't recommended. Forking of course does the same thing, but it is obvious that it is doing so, whereas people are surprised to learn editing sends everything.