AI agents: Less capability, more reliability, please

(www.sergey.fyi)

423 points serjester | 3 comments | 31 Mar 25 14:45 UTC | HN request time: 0s | source

Show context

simonw ◴[31 Mar 25 15:08 UTC] No.43535919[source]▶

Yeah, the "book a flight" agent thing is a running joke now - it was a punchline in the Swyx keynote for the recent AI Engineer event in NYC: https://www.latent.space/p/agent

I think this piece is underestimating the difficulty involved here though. If only it was as easy as "just pick a single task and make the agent really good at that"!

The problem is that if your UI involves human beings typing or talking to you in a human language, there is an unbounded set of ways things could go wrong. You can't test against every possible variant of what they might say. Humans are bad at clearly expressing things, but even worse is the challenge of ensuring they have a concrete, accurate mental model of what the software can and cannot do.

replies(12): >>43536068 #>>43536088 #>>43536142 #>>43536257 #>>43536583 #>>43536731 #>>43537089 #>>43537591 #>>43539058 #>>43539104 #>>43539116 #>>43540011 #

hansmayer ◴[31 Mar 25 16:18 UTC] No.43536731[source]▶

>>43535919 #

It's so funny when people try to build robots imitating people. I mean part funny, part tragedy of the upcoming bust. The irony being, we would have been better off with an interoperable flight booking API standard which a deterministic headless agent could use to make perfect bookings every single time. There is a reason current user interfaces stem from a scientific discipline once called "Human-Computer Interaction".

replies(3): >>43537033 #>>43537160 #>>43538872 #

TeMPOraL ◴[31 Mar 25 16:56 UTC] No.43537160[source]▶

>>43536731 #

It's a business problem, not a tech problem. We don't have a solution you described because half of the air travel industry relies on things not being interoperable. AI is the solution at the limit, one set of companies selling users the ability to show a middle finger to a much wider set of companies - interoperability by literally having a digital human approximation pretending to be the user.

replies(2): >>43537239 #>>43539204 #

the_snooze ◴[31 Mar 25 17:03 UTC] No.43537239[source]▶

>>43537160 #

I've been a sentient human for at least the last 15 years of tech advancement. Assuming this stuff actually works, it's only a matter of time before these AI services claw back all that value for themselves and hold users and businesses hostage to one another, just like social media and e-commerce before. https://en.wikipedia.org/wiki/Enshittification

Unless these tools can be run locally independent of a service provider, we're just trading one boss for another.

replies(3): >>43537372 #>>43538500 #>>43543039 #

ben_w ◴[31 Mar 25 17:16 UTC] No.43537372{3}[source]▶

>>43537239 #

> Unless these tools can be run locally independent of a service provider, we're just trading one boss for another.

Many of them already can be. Many more existing models will become local options if/when RAM prices decline.

But this won't necessarily prevent enshittification, as there's always a possibility of a new model being tasked with pushing adverts or propaganda. And perhaps existing models already have been — certainly some people talk as if it's so.

replies(1): >>43544398 #

1. TeMPOraL ◴[01 Apr 25 08:56 UTC] No.43544398{4}[source]▶

>>43537372 #

People are worried about the wrong side of equation. Other problems with them notwithstanding, it's not the browser wars that killed interoperability on the Web - it's everyone else. Any browser you ever used could issue the same HTTP calls (up to standards of a given time, ofc.) - but it helps you with nothing if the endpoint only works when you've signed a contract to access the private API.

The same fate may come to AI, and that worries me. It won't matter whether you're using OpenAI models, Anthropic models, or locally run models, any more than it matters whether you use Firefox, Chrome or raw cURL - if the business gets to differentiate further between users and AI agents working as users, and especially if they get legal backing to doing that, you can kiss all the benefits of LLMs goodbye, they won't be yours as end-user, they'll all accrue to capitalists, who in turn will lend slivers of it to you, for a price of a subscription.

replies(1): >>43547640 #

2. mdaniel ◴[01 Apr 25 15:03 UTC] No.43547640[source]▶

>>43544398 (TP) #

> Any browser you ever used could issue the same HTTP calls (up to standards of a given time, ofc.) - but it helps you with nothing if the endpoint only works when you've signed a contract to access the private API.

Oh, you mean like everyone who shows up to the Cloudflare submissions pointing out how they've been blocklisted from about 50% of the Internet, without recourse, due to the audacity to not run Chrome? In that circumstance, it's actually worse(?) because to the best of my knowledge I cannot subscribe to Cloudflare Verified to not get the :fu: I just have to hope the Eye of Sauron doesn't find me

That reminds me, it's probably time for my semi-annual Google Takeout

replies(1): >>43550428 #

3. TeMPOraL ◴[01 Apr 25 19:21 UTC] No.43550428[source]▶

>>43547640 #

Yeah, that's just an extension of what I said. After all, it's not Google/Chrome that's creating this problem - it's Cloudflare and people who buy this service from them, by making the lazy/economically prudent assumption that anyone who has an opinion on how they consume services can be bucketed together with scammers and denied access.

It stems from the problem I described though - blocking you for not using Chrome is just "only illegitimate users don't use Chrome", which is the next step after "only illegitimate users would want to use our API endpoints without starting a business and signing a formal contract with us".

↑