Most active commenters

TeMPOraL(7)
hansmayer(5)
dartos(3)

Popular/hot comments

>>43537071 #
>>43537239 #

←back to thread

AI agents: Less capability, more reliability, please

(www.sergey.fyi)

Show context

simonw ◴[31 Mar 25 15:08 UTC] No.43535919[source]▶

>>43535653 (OP) #

Yeah, the "book a flight" agent thing is a running joke now - it was a punchline in the Swyx keynote for the recent AI Engineer event in NYC: https://www.latent.space/p/agent

I think this piece is underestimating the difficulty involved here though. If only it was as easy as "just pick a single task and make the agent really good at that"!

The problem is that if your UI involves human beings typing or talking to you in a human language, there is an unbounded set of ways things could go wrong. You can't test against every possible variant of what they might say. Humans are bad at clearly expressing things, but even worse is the challenge of ensuring they have a concrete, accurate mental model of what the software can and cannot do.

replies(12): >>43536068 #>>43536088 #>>43536142 #>>43536257 #>>43536583 #>>43536731 #>>43537089 #>>43537591 #>>43539058 #>>43539104 #>>43539116 #>>43540011 #

1. hansmayer ◴[31 Mar 25 16:18 UTC] No.43536731[source]▶

>>43535919 #

It's so funny when people try to build robots imitating people. I mean part funny, part tragedy of the upcoming bust. The irony being, we would have been better off with an interoperable flight booking API standard which a deterministic headless agent could use to make perfect bookings every single time. There is a reason current user interfaces stem from a scientific discipline once called "Human-Computer Interaction".

replies(3): >>43537033 #>>43537160 #>>43538872 #

2. jatins ◴[31 Mar 25 16:44 UTC] No.43537033[source]▶

>>43536731 (TP) #

But that's the promise of AI, right? You can't put an API on everything for human + technological reasons.

replies(2): >>43537058 #>>43537071 #

3. hansmayer ◴[31 Mar 25 16:47 UTC] No.43537058[source]▶

>>43537033 #

It is a promise alright :)

4. dartos ◴[31 Mar 25 16:48 UTC] No.43537071[source]▶

>>43537033 #

You can’t put an API on everything because it’d take a ton of time and money to pull that off.

I can’t think of any technological reasons why every digital system can’t have an API (barring security concerns, as those would need to be case by case)

So instead, we put 100s of billions of dollars into statistical models hoping they could do it for us.

It’s kind of backwards.

replies(3): >>43537721 #>>43537952 #>>43538391 #

5. TeMPOraL ◴[31 Mar 25 16:56 UTC] No.43537160[source]▶

>>43536731 (TP) #

It's a business problem, not a tech problem. We don't have a solution you described because half of the air travel industry relies on things not being interoperable. AI is the solution at the limit, one set of companies selling users the ability to show a middle finger to a much wider set of companies - interoperability by literally having a digital human approximation pretending to be the user.

replies(2): >>43537239 #>>43539204 #

6. the_snooze ◴[31 Mar 25 17:03 UTC] No.43537239[source]▶

>>43537160 #

I've been a sentient human for at least the last 15 years of tech advancement. Assuming this stuff actually works, it's only a matter of time before these AI services claw back all that value for themselves and hold users and businesses hostage to one another, just like social media and e-commerce before. https://en.wikipedia.org/wiki/Enshittification

Unless these tools can be run locally independent of a service provider, we're just trading one boss for another.

replies(3): >>43537372 #>>43538500 #>>43543039 #

7. ben_w ◴[31 Mar 25 17:16 UTC] No.43537372{3}[source]▶

>>43537239 #

> Unless these tools can be run locally independent of a service provider, we're just trading one boss for another.

Many of them already can be. Many more existing models will become local options if/when RAM prices decline.

But this won't necessarily prevent enshittification, as there's always a possibility of a new model being tasked with pushing adverts or propaganda. And perhaps existing models already have been — certainly some people talk as if it's so.

replies(1): >>43544398 #

8. Scene_Cast2 ◴[31 Mar 25 17:49 UTC] No.43537721{3}[source]▶

>>43537071 #

You change who's paying.

replies(1): >>43537841 #

9. dartos ◴[31 Mar 25 18:01 UTC] No.43537841{4}[source]▶

>>43537721 #

Sure, as a biz it makes sense, but as a society, it’s obviously a big failure.

10. datadrivenangel ◴[31 Mar 25 18:12 UTC] No.43537952{3}[source]▶

>>43537071 #

A web page is an Application/Human Interface. Outside of security concerns, companies can make more money if they control the Application/Human Interface, and reduce the risk of a middleman / broker extorting them for margins.

If I run a flight aggregator that has a majority of flight bookings, I can start charging 'rents' by allowing featured/sponsored listings to be promoted above the 'best' result, leading to a prisoner's dilemma where airlines should pay up to their margins to keep market share.

If an AI company becomes the default application human interface, they can do the same thing. Pay OpenAI tribute or be ended as a going concern.

replies(1): >>43542902 #

11. daxfohl ◴[31 Mar 25 18:52 UTC] No.43538391{3}[source]▶

>>43537071 #

Exactly. It should take around 10 parameters to book a flight. Not 30,000,000,000 and a dedicated nuclear power plant.

12. polishdude20 ◴[31 Mar 25 19:01 UTC] No.43538500{3}[source]▶

>>43537239 #

The difference is that social media isn't special because of its hardware or software even. People are stuck on fa ebook because everyone else is on it. It's network effects. LLMs currently have no network effects. Your friends and family aren't "on" chatgpt so why use that over something else?

Once performance of a local setup is on par with online ones or good enough, that'll be game over for them.

replies(1): >>43544361 #

13. doug_durham ◴[31 Mar 25 19:32 UTC] No.43538872[source]▶

>>43536731 (TP) #

Your use of the word "perfect" is doing a lot of heavy lifting. "Perfect" is a word embedded in a high dimensional space whose local maxima are different for every human on the planet.

replies(1): >>43545036 #

14. bluGill ◴[31 Mar 25 20:00 UTC] No.43539204[source]▶

>>43537160 #

The airlines rely on things not interoperating for you. However their agents interoperate all the time via code sharing. They don't want normal people to do this but if something goes wrong with the airplane you should be on they would prefer you to get there than not.

replies(1): >>43544646 #

15. dartos ◴[01 Apr 25 04:38 UTC] No.43542902{4}[source]▶

>>43537952 #

LLMs as a natural language interface is fine.

What I’m saying is that if there was a standard protocol for making travel plans over the internet, we wouldn’t need an AI agent to book a trip.

We could just create great user experiences that expose those APIs like we do for pretty much everything on the web.

16. aledalgrande ◴[01 Apr 25 05:08 UTC] No.43543039{3}[source]▶

>>43537239 #

> Unless these tools can be run locally independent of a service provider, we're just trading one boss for another.

Not only that, we have to be careful about all the integrations being built around it. Thankfully the MCP standard is becoming mainstream (used by Anthropic, OpenAI and next could be Google) and it's an open standard, even if started by Anthropic so we won't have e.g. Anthropic specific integrations.

replies(1): >>43544439 #

17. TeMPOraL ◴[01 Apr 25 08:51 UTC] No.43544361{4}[source]▶

>>43538500 #

All it takes is for the "omg AI slop!!111" and "would someone think of my copyrights?" crowd to get their way - resulting in a conventional or legal ban for using AI user-agents on the Internet without express consent of a site/service provider. From there, it will be APIs all over again: much like today, you can't easily pipe your Facebook photo to your OneDrive and make a calendar invite - but you can use (for example) Zapier with Facebook Integration, OneDrive Integration and Google Calendar Integration, we'll end up with LLM/chatbot companies whose main value is in their exclusive set of integrations they offer.

So true, it's not going to be "I use PolishDude20GPTBook because my family and friends are on it". It's going to be, "I use PolishDude20GPTBook because they have contracts with Gazeta.pl, Onet, TVN24, OLX and Allegro, so I can use it to get local news and find best-priced products in a convenient way, whereas I can't use TeMPOraLxity for any of that".

Contracts over APIs, again.

As long as the "think of my copyright / AI slop oneoneone" crowd wins. It must not.

replies(1): >>43554594 #

18. TeMPOraL ◴[01 Apr 25 08:56 UTC] No.43544398{4}[source]▶

>>43537372 #

People are worried about the wrong side of equation. Other problems with them notwithstanding, it's not the browser wars that killed interoperability on the Web - it's everyone else. Any browser you ever used could issue the same HTTP calls (up to standards of a given time, ofc.) - but it helps you with nothing if the endpoint only works when you've signed a contract to access the private API.

The same fate may come to AI, and that worries me. It won't matter whether you're using OpenAI models, Anthropic models, or locally run models, any more than it matters whether you use Firefox, Chrome or raw cURL - if the business gets to differentiate further between users and AI agents working as users, and especially if they get legal backing to doing that, you can kiss all the benefits of LLMs goodbye, they won't be yours as end-user, they'll all accrue to capitalists, who in turn will lend slivers of it to you, for a price of a subscription.

replies(1): >>43547640 #

19. TeMPOraL ◴[01 Apr 25 09:00 UTC] No.43544439{4}[source]▶

>>43543039 #

See my replies to other comments parallel to yours. But in short: MCP doesn't help us anymore than cURL lets you replicate Zapier in a shell script - the bad future is that, like with APIs, service providers get to differentiate between humans and AI user-agents, and restrict the latter to endpoints governed by B2B contracts.

20. TeMPOraL ◴[01 Apr 25 09:27 UTC] No.43544646{3}[source]▶

>>43539204 #

> They don't want normal people to do this

That's the root of the problem. That's precisely why computers are not the "bicycles for the minds" they were imagined to be.

It's not a conspiracy theory, either. Most of the tech industry makes money inserting themselves between you and your problem and trying to make sure you're stuck with them.

21. hansmayer ◴[01 Apr 25 10:22 UTC] No.43545036[source]▶

>>43538872 #

No, it's just the intuitively perfect that comes to mind in this context, i.e. reliable and guaranteed to produce a safe outcome. Much like Amazon checkout process. I am fine giving my credit card details to near-perfect automatons like that. I will never give it to a statistical model, which may or may not hallucinate the sum it is supposed to enter into an interface built for humans, not computers.

22. mdaniel ◴[01 Apr 25 15:03 UTC] No.43547640{5}[source]▶

>>43544398 #

> Any browser you ever used could issue the same HTTP calls (up to standards of a given time, ofc.) - but it helps you with nothing if the endpoint only works when you've signed a contract to access the private API.

Oh, you mean like everyone who shows up to the Cloudflare submissions pointing out how they've been blocklisted from about 50% of the Internet, without recourse, due to the audacity to not run Chrome? In that circumstance, it's actually worse(?) because to the best of my knowledge I cannot subscribe to Cloudflare Verified to not get the :fu: I just have to hope the Eye of Sauron doesn't find me

That reminds me, it's probably time for my semi-annual Google Takeout

replies(1): >>43550428 #

23. TeMPOraL ◴[01 Apr 25 19:21 UTC] No.43550428{6}[source]▶

>>43547640 #

Yeah, that's just an extension of what I said. After all, it's not Google/Chrome that's creating this problem - it's Cloudflare and people who buy this service from them, by making the lazy/economically prudent assumption that anyone who has an opinion on how they consume services can be bucketed together with scammers and denied access.

It stems from the problem I described though - blocking you for not using Chrome is just "only illegitimate users don't use Chrome", which is the next step after "only illegitimate users would want to use our API endpoints without starting a business and signing a formal contract with us".

24. hansmayer ◴[02 Apr 25 08:08 UTC] No.43554594{5}[source]▶

>>43544361 #

The only reason that there is a "AI-slop-crowd" (as you call it) is that, well...there is a lot of (Gen-)AI slop. If the technology was as miraculous as it has been hyped up for several years now, there would be no such crowd. Everyone would just get on. If a tech just does what it says it does, everyone gets on board. Internet is a great example of this, so were the smartphones after the iPhone moment. There was never an Anti-Internet-Crowd, I wonder why that might be?

replies(1): >>43561998 #

25. TeMPOraL ◴[02 Apr 25 21:43 UTC] No.43561998{6}[source]▶

>>43554594 #

> There was never an Anti-Internet-Crowd, I wonder why that might be?

You forgot the dotcom boom? :)

Existence of AI slop has nothing to do with whether the tech itself is exceeding or falling short of its hype. It exists because it's good enough for advertising, the cancer on modern society that metastasizes to every new medium and technology, defiling and destroying everything it touches.

replies(1): >>43568273 #

26. hansmayer ◴[03 Apr 25 11:50 UTC] No.43568273{7}[source]▶

>>43561998 #

No, what you mean with dotcom boom? That's a completely wrong comparison. The dotcom-boom had more to do with the credit crunch of 2008, not with the inherent qualities of Internet as a technology. With or without dotcom boom, people continued to use Internet, because it was useful to them. I don't remember anyone was promising us that Internet would be able to do this and that, if only... It just worked and everyone got what it was about, right from the start. Where is that moment with LLMs? Summarising my emails, inventing non-existing functions in my code? Hard pass :)

↑