AI agents: Less capability, more reliability, please

(www.sergey.fyi)

423 points serjester | 2 comments | 31 Mar 25 14:45 UTC | HN request time: 0.437s | source

Show context

simonw ◴[31 Mar 25 15:08 UTC] No.43535919[source]▶

Yeah, the "book a flight" agent thing is a running joke now - it was a punchline in the Swyx keynote for the recent AI Engineer event in NYC: https://www.latent.space/p/agent

I think this piece is underestimating the difficulty involved here though. If only it was as easy as "just pick a single task and make the agent really good at that"!

The problem is that if your UI involves human beings typing or talking to you in a human language, there is an unbounded set of ways things could go wrong. You can't test against every possible variant of what they might say. Humans are bad at clearly expressing things, but even worse is the challenge of ensuring they have a concrete, accurate mental model of what the software can and cannot do.

replies(12): >>43536068 #>>43536088 #>>43536142 #>>43536257 #>>43536583 #>>43536731 #>>43537089 #>>43537591 #>>43539058 #>>43539104 #>>43539116 #>>43540011 #

serjester ◴[31 Mar 25 15:36 UTC] No.43536257[source]▶

>>43535919 #

Even operator's original demo the first thing they showed was booking restaurant reservations and ordering groceries. I understand their need to demo something intuitive but it's still debatable whether these tasks are ones that most people want delegated to black-box agents.

replies(1): >>43538396 #

ToucanLoucan ◴[31 Mar 25 18:52 UTC] No.43538396[source]▶

>>43536257 #

They don't. I have never once in my life wanted to talk to my smart speaker about what I wanted for dinner, not even because a smart speaker is/can be creepy, not because of social anxiety, no, it's just simpler and more straightforward to open Doordash on my damn phone, and look at a list of restaurants nearby to order from. Or browse a list of products on Amazon to buy. Or just call a restaurant to get a reservation. These tasks are trivial.

And like, as a socially anxious millennial, no I don't particularly like phone calls. However I also recognize that setting my discomfort aside, a direct connection to a human being who can help reason out a problem I'm having is not something easily replaced with a chatbot or an AI assistant. It just isn't. Perfect example: called a place to make a reservation for myself, my wife and girlfriend (poly long story) and found the place didn't usually do reservations on the day in question, but the person did ask when we'd be there. As I was talking to a person, I could provide that information immediately, and say "if you don't take reservations don't worry, that's fine," but it was an off-busy hour so we got one anyway. How does an AI navigate that conversation more efficiently than me?

As a techie person I basically spend the entire day interacting with various software to perform various tasks, work related and otherwise. I cannot overstate: NONE of these interactions, not a single one, is improved one iota by turning it into a conversation, verbal or text-based, with my or someone else's computer. By definition it makes basic tasks take longer, every time, without fail.

replies(3): >>43539272 #>>43543816 #>>43547918 #

bluGill ◴[31 Mar 25 20:06 UTC] No.43539272[source]▶

>>43538396 #

I've more than once been on a roadtrip and realized that wanted something to help me find a meal where I'll be sometime in the next 2 hours. I have no idea what the options are and I can't find them. All too often I've taken some generic fast food when I really wanted something local but I couldn't get maps to tell me and such things are one street away where I wouldn't see it. (remember too if i'm driving I can't spend time to scroll through a list - but even when I'm navigator the interface I can find in maps isn't good)

replies(3): >>43539558 #>>43542474 #>>43543636 #

viraptor ◴[01 Apr 25 06:54 UTC] No.43543636[source]▶

>>43539272 #

I'm curious what the problem is with that task. I'd open Google maps, find a larger place in the right direction, confirm with directions that it's about 2h away, search for "dinner/lunch/restaurant/Japanese/tacos/..." in the visible area, choose something highly rated. I've done that lots of times successfully. What part is that fails for you? (As a non-driver of course)

replies(1): >>43546266 #

1. bluGill ◴[01 Apr 25 12:58 UTC] No.43546266[source]▶

>>43543636 #

The problem is choice. I don't care about Japanese/tacos - either would be fine, but Argentine would be better (I have no idea if it is even a thing, but if it is I want to try it). I don't want a chain (well maybe a local chain) - I have plenty of McDonald's near my house if I want that, I want something I can't get near home. Maps will put right at top all the big chains that pay for that top spot and I need to scroll through them. More than once I've seen something that might be interesting but then the map scrolls/resizes and I can't find it anymore.

replies(1): >>43546362 #

2. ToucanLoucan ◴[01 Apr 25 13:07 UTC] No.43546362[source]▶

>>43546266 (TP) #

But you're taking as a given that the AI is going to have any better idea than Google Maps, or be subject to less interference from marketing/paid placement stuff, when like... I'd be willing to bet a small amount of money that it's going to do what you're decrying: it's going to search $localized_area for "restaurant" and if you're lucky, maybe add -chain to it. What you want here are locals notions of what's good and not, and while I absolutely respect the shit out of that (and would love it myself!) I don't really know how to facilitate that at scale without immediately caving to the same negative influences that are screwing it up right now.

Like, really what you're wanting is legitimate information not bound to the whims of advertisers and marketers (and again, to be clear, don't we fucking all) but I don't think an LLM is going to do that for you. If it does it now, and that's a load-bearing if, I have a strong feeling that's because this tech, like all tech, is in it's infancy stage. It hasn't yet gotten enough attention from corporations and their slimy marketing divisions, but that's a temporary state of affairs and has been for every past tech too. Like, OpenAI just closed another funding round and it's valuation is now THREE HUNDRED BILLION. Do you REALLY think they and by extension/as a result, their competitors, are going to be thinking about editorial independence when existing established information institutions already can't?

↑