AI agents: Less capability, more reliability, please

(www.sergey.fyi)

423 points serjester | 3 comments | 31 Mar 25 14:45 UTC | HN request time: 0.727s | source

Show context

simonw ◴[31 Mar 25 15:08 UTC] No.43535919[source]▶

Yeah, the "book a flight" agent thing is a running joke now - it was a punchline in the Swyx keynote for the recent AI Engineer event in NYC: https://www.latent.space/p/agent

I think this piece is underestimating the difficulty involved here though. If only it was as easy as "just pick a single task and make the agent really good at that"!

The problem is that if your UI involves human beings typing or talking to you in a human language, there is an unbounded set of ways things could go wrong. You can't test against every possible variant of what they might say. Humans are bad at clearly expressing things, but even worse is the challenge of ensuring they have a concrete, accurate mental model of what the software can and cannot do.

replies(12): >>43536068 #>>43536088 #>>43536142 #>>43536257 #>>43536583 #>>43536731 #>>43537089 #>>43537591 #>>43539058 #>>43539104 #>>43539116 #>>43540011 #

emn13 ◴[31 Mar 25 15:26 UTC] No.43536142[source]▶

>>43535919 #

Perhaps the solutions(s) needs to be less focusing on output quality, and more on having a solid process for dealing with errors. Think undo, containers, git, CRDTs or whatever rather than zero tolerance for errors. That probably also means some kind of review for the irreversible bits of any process, and perhaps even process changes where possible to make common processes more reversible (which sounds like an extreme challenge in some cases).

I can't imagine we're anywhere even close to the kind of perfection required not to need something like this - if it's even possible. Humans use all kinds of review and audit processes precisely because perfection is rarely attainable, and that might be fundamental.

replies(6): >>43536235 #>>43536390 #>>43536448 #>>43536860 #>>43536868 #>>43538708 #

ModernMech ◴[31 Mar 25 19:19 UTC] No.43538708[source]▶

>>43536142 #

> Perhaps the solutions(s) needs to be less focusing on output quality, and more on having a solid process for dealing with errors. Think undo, containers, git, CRDTs

LLMs are supposed to save us from the toils of software engineering, but it looks like we're going to reinvent software engineering to make AI useful.

Problem: Programming languages are too hard.

Solution: AI!

Problem: AI is not reliable, it's hard to specify problems precisely so that it understands what I mean unambiguously.

Solution: Programming languages!

replies(2): >>43539173 #>>43554011 #

1. Workaccount2 ◴[31 Mar 25 19:58 UTC] No.43539173[source]▶

>>43538708 #

With pretty much every new technology, society has bent towards the tech too.

When smartphones first popped up, browsing the web on them was a pain. Now pretty much the whole web has phone versions that make it easier*.

*I recognize the folly of stating this on HN.

replies(1): >>43540691 #

2. LtWorf ◴[31 Mar 25 22:35 UTC] No.43540691[source]▶

>>43539173 (TP) #

No it's still a pain.

There's apps that open links in their embedded browser where ads aren't blocked. So I need to copy the link and open them in my real browser.

replies(1): >>43541979 #

3. mdaniel ◴[01 Apr 25 01:42 UTC] No.43541979[source]▶

>>43540691 #

Or my other favorite trap: an embedded browser where I'm not authenticated. Great, now I have to roll the dice about pasting a password in your "trust me, bro" looking login page because I cannot see the URL and the autofill is all "nope"

↑