AI agents: Less capability, more reliability, please

(www.sergey.fyi)

423 points serjester | 2 comments | 31 Mar 25 14:45 UTC | HN request time: 0s | source

Show context

simonw ◴[31 Mar 25 15:08 UTC] No.43535919[source]▶

Yeah, the "book a flight" agent thing is a running joke now - it was a punchline in the Swyx keynote for the recent AI Engineer event in NYC: https://www.latent.space/p/agent

I think this piece is underestimating the difficulty involved here though. If only it was as easy as "just pick a single task and make the agent really good at that"!

The problem is that if your UI involves human beings typing or talking to you in a human language, there is an unbounded set of ways things could go wrong. You can't test against every possible variant of what they might say. Humans are bad at clearly expressing things, but even worse is the challenge of ensuring they have a concrete, accurate mental model of what the software can and cannot do.

replies(12): >>43536068 #>>43536088 #>>43536142 #>>43536257 #>>43536583 #>>43536731 #>>43537089 #>>43537591 #>>43539058 #>>43539104 #>>43539116 #>>43540011 #

photonthug ◴[31 Mar 25 21:11 UTC] No.43540011[source]▶

>>43535919 #

> The problem is that if your UI involves human beings typing or talking to you in a human language, there is an unbounded set of ways things could go wrong. You can't test against every possible variant of what they might say.

It's almost like we really might benefit from using the advances in AI for stuff like speech recognition to build concrete interfaces with specific predefined vocabularies and a local-first UX. But stuff like that undermines a cloud-based service and a constantly changing interface and the opportunities for general spying and manufacturing "engagement" while people struggle to use the stuff you've made. And of course, producing actual specifications means that you would have to own bugs. Besides eliminating employees, much interest in AI is all about completely eliminating responsibility. As a user of ML-based monitoring products and such for years.. "intelligence" usually implies no real specifications, and no specifications implies no bugs, and no bugs implies rent-seeking behaviour without the burden of any actual responsibilities.

It's frustrating to see how often even technologists buy the story that "users don't want/need concrete specifications" or that "users aren't smart enough to deal with concrete interfaces". It's a trick.

replies(2): >>43541097 #>>43543751 #

freeone3000 ◴[31 Mar 25 23:23 UTC] No.43541097[source]▶

>>43540011 #

> concrete interfaces with specific predefined vocabularies and a local-first UX

An app? We don’t even need to put AI in it, turns out you can book flights without one.

replies(2): >>43541512 #>>43542179 #

photonthug ◴[01 Apr 25 00:22 UTC] No.43541512[source]▶

>>43541097 #

Tech won't freeze in place exactly where it's at today even if some people want that, and even if in some cases it actually would make sense. And.. if you advocate for this I think you risk losing credibility. Especially amongst technologists it's better to think critically about structural problems with the trends and trajectories. AI is fine, change is fine.. the question now is really more like why and what for and in the interest of whom. To the extent models work locally, we'll be empowered in the end.

Similarly, software eating the world was actually pretty much fine, but SaaS is/was a bit of a trap. And anyone who thought SaaS was bad should be terrified about the moats and platform lock-in that billion dollar models might mean, the enshittification that inevitably follows market dominance, etc.

Honestly we kinda need a new Stallman for the brave new world, someone who is relentlessly beating the drum on this stuff even if they come across as anticorporate and extreme. An extremist might get traction, but a call to preserve things as they are probably cannot / should not.

replies(3): >>43542731 #>>43546231 #>>43547456 #

PKop ◴[01 Apr 25 14:49 UTC] No.43547456[source]▶

>>43541512 #

>And.. if you advocate for this I think you risk losing credibility

It's a shame if new interface = credible by default. Look at all the car manufacturers (well some, probably not enough) finally after many years conceding the change to touch interfaces "because new" was a terrible idea, when the right old tool for the job was simply better...and obvious to end-users very quickly.

replies(1): >>43552042 #

1. photonthug ◴[01 Apr 25 22:42 UTC] No.43552042[source]▶

>>43547456 #

Again in that case the newness of different tech isn’t actually the real problem and feels like the wrong critique. What’s problematic is trajectory and intent with things like planned obsolescence, subscriptions, ongoing costs in repairs after initial sale. I’d say that a new interface is barely even an issue compared with that.. although fwiw, yes, I prefer buttons rather than touch screens.

replies(1): >>43552924 #

2. PKop ◴[02 Apr 25 01:29 UTC] No.43552924[source]▶

>>43552042 (TP) #

>the newness of different tech isn’t actually the real problem and feels like the wrong critique

I'm not equating new = bad. I'm saying new = good is wrong. And based on your last sentence, you do think car manufacturers all switching over to all touch controls was a problem. Almost everyone prefers buttons to touch screens, that's my point. The better more popular option was rejected because of a false premise, or false belief.

↑