←back to thread

423 points serjester | 2 comments | | HN request time: 1.536s | source
Show context
simonw ◴[] No.43535919[source]
Yeah, the "book a flight" agent thing is a running joke now - it was a punchline in the Swyx keynote for the recent AI Engineer event in NYC: https://www.latent.space/p/agent

I think this piece is underestimating the difficulty involved here though. If only it was as easy as "just pick a single task and make the agent really good at that"!

The problem is that if your UI involves human beings typing or talking to you in a human language, there is an unbounded set of ways things could go wrong. You can't test against every possible variant of what they might say. Humans are bad at clearly expressing things, but even worse is the challenge of ensuring they have a concrete, accurate mental model of what the software can and cannot do.

replies(12): >>43536068 #>>43536088 #>>43536142 #>>43536257 #>>43536583 #>>43536731 #>>43537089 #>>43537591 #>>43539058 #>>43539104 #>>43539116 #>>43540011 #
hansmayer ◴[] No.43536731[source]
It's so funny when people try to build robots imitating people. I mean part funny, part tragedy of the upcoming bust. The irony being, we would have been better off with an interoperable flight booking API standard which a deterministic headless agent could use to make perfect bookings every single time. There is a reason current user interfaces stem from a scientific discipline once called "Human-Computer Interaction".
replies(3): >>43537033 #>>43537160 #>>43538872 #
1. doug_durham ◴[] No.43538872[source]
Your use of the word "perfect" is doing a lot of heavy lifting. "Perfect" is a word embedded in a high dimensional space whose local maxima are different for every human on the planet.
replies(1): >>43545036 #
2. hansmayer ◴[] No.43545036[source]
No, it's just the intuitively perfect that comes to mind in this context, i.e. reliable and guaranteed to produce a safe outcome. Much like Amazon checkout process. I am fine giving my credit card details to near-perfect automatons like that. I will never give it to a statistical model, which may or may not hallucinate the sum it is supposed to enter into an interface built for humans, not computers.