AI agents: Less capability, more reliability, please

(www.sergey.fyi)

423 points serjester | 2 comments | 31 Mar 25 14:45 UTC | HN request time: 0s | source

Show context

simonw ◴[31 Mar 25 15:08 UTC] No.43535919[source]▶

Yeah, the "book a flight" agent thing is a running joke now - it was a punchline in the Swyx keynote for the recent AI Engineer event in NYC: https://www.latent.space/p/agent

I think this piece is underestimating the difficulty involved here though. If only it was as easy as "just pick a single task and make the agent really good at that"!

The problem is that if your UI involves human beings typing or talking to you in a human language, there is an unbounded set of ways things could go wrong. You can't test against every possible variant of what they might say. Humans are bad at clearly expressing things, but even worse is the challenge of ensuring they have a concrete, accurate mental model of what the software can and cannot do.

replies(12): >>43536068 #>>43536088 #>>43536142 #>>43536257 #>>43536583 #>>43536731 #>>43537089 #>>43537591 #>>43539058 #>>43539104 #>>43539116 #>>43540011 #

hansmayer ◴[31 Mar 25 16:18 UTC] No.43536731[source]▶

>>43535919 #

It's so funny when people try to build robots imitating people. I mean part funny, part tragedy of the upcoming bust. The irony being, we would have been better off with an interoperable flight booking API standard which a deterministic headless agent could use to make perfect bookings every single time. There is a reason current user interfaces stem from a scientific discipline once called "Human-Computer Interaction".

replies(3): >>43537033 #>>43537160 #>>43538872 #

TeMPOraL ◴[31 Mar 25 16:56 UTC] No.43537160[source]▶

>>43536731 #

It's a business problem, not a tech problem. We don't have a solution you described because half of the air travel industry relies on things not being interoperable. AI is the solution at the limit, one set of companies selling users the ability to show a middle finger to a much wider set of companies - interoperability by literally having a digital human approximation pretending to be the user.

replies(2): >>43537239 #>>43539204 #

the_snooze ◴[31 Mar 25 17:03 UTC] No.43537239[source]▶

>>43537160 #

I've been a sentient human for at least the last 15 years of tech advancement. Assuming this stuff actually works, it's only a matter of time before these AI services claw back all that value for themselves and hold users and businesses hostage to one another, just like social media and e-commerce before. https://en.wikipedia.org/wiki/Enshittification

Unless these tools can be run locally independent of a service provider, we're just trading one boss for another.

replies(3): >>43537372 #>>43538500 #>>43543039 #

1. aledalgrande ◴[01 Apr 25 05:08 UTC] No.43543039[source]▶

>>43537239 #

> Unless these tools can be run locally independent of a service provider, we're just trading one boss for another.

Not only that, we have to be careful about all the integrations being built around it. Thankfully the MCP standard is becoming mainstream (used by Anthropic, OpenAI and next could be Google) and it's an open standard, even if started by Anthropic so we won't have e.g. Anthropic specific integrations.

replies(1): >>43544439 #

2. TeMPOraL ◴[01 Apr 25 09:00 UTC] No.43544439[source]▶

>>43543039 (TP) #

See my replies to other comments parallel to yours. But in short: MCP doesn't help us anymore than cURL lets you replicate Zapier in a shell script - the bad future is that, like with APIs, service providers get to differentiate between humans and AI user-agents, and restrict the latter to endpoints governed by B2B contracts.

↑