AI agents: Less capability, more reliability, please

(www.sergey.fyi)

423 points serjester | 1 comments | 31 Mar 25 14:45 UTC | HN request time: 0.288s | source

Show context

simonw ◴[31 Mar 25 15:08 UTC] No.43535919[source]▶

Yeah, the "book a flight" agent thing is a running joke now - it was a punchline in the Swyx keynote for the recent AI Engineer event in NYC: https://www.latent.space/p/agent

I think this piece is underestimating the difficulty involved here though. If only it was as easy as "just pick a single task and make the agent really good at that"!

The problem is that if your UI involves human beings typing or talking to you in a human language, there is an unbounded set of ways things could go wrong. You can't test against every possible variant of what they might say. Humans are bad at clearly expressing things, but even worse is the challenge of ensuring they have a concrete, accurate mental model of what the software can and cannot do.

replies(12): >>43536068 #>>43536088 #>>43536142 #>>43536257 #>>43536583 #>>43536731 #>>43537089 #>>43537591 #>>43539058 #>>43539104 #>>43539116 #>>43540011 #

yujzgzc ◴[31 Mar 25 16:50 UTC] No.43537089[source]▶

>>43535919 #

I'm old enough to remember having to talk to a (human) agent in order to book flights, and can confirm that in my experience, the modern flight booking website is an order of magnitude better UX than talking to someone about your travel plans.

replies(3): >>43537397 #>>43539127 #>>43542842 #

1. toasterlovin ◴[01 Apr 25 04:25 UTC] No.43542842[source]▶

>>43537089 #

I think what we’ll come to widely realize is that syncing state between two minds (in your example, the travel agent’s mind and your mind; more widely, AI agents and their user’s minds) is extremely expensive and slow and it’s gonna be very hard to make these systems good enough to overcome the super low latency of keeping a task contained to a single mind, your own, and just doing most stuff yourself. The CPU/GPU dichotomy as a lens for viewing the world is widely applicable, IME.

↑