AI agents: Less capability, more reliability, please

(www.sergey.fyi)

423 points serjester | 1 comments | 31 Mar 25 14:45 UTC | HN request time: 0.211s | source

Show context

simonw ◴[31 Mar 25 15:08 UTC] No.43535919[source]▶

Yeah, the "book a flight" agent thing is a running joke now - it was a punchline in the Swyx keynote for the recent AI Engineer event in NYC: https://www.latent.space/p/agent

I think this piece is underestimating the difficulty involved here though. If only it was as easy as "just pick a single task and make the agent really good at that"!

The problem is that if your UI involves human beings typing or talking to you in a human language, there is an unbounded set of ways things could go wrong. You can't test against every possible variant of what they might say. Humans are bad at clearly expressing things, but even worse is the challenge of ensuring they have a concrete, accurate mental model of what the software can and cannot do.

replies(12): >>43536068 #>>43536088 #>>43536142 #>>43536257 #>>43536583 #>>43536731 #>>43537089 #>>43537591 #>>43539058 #>>43539104 #>>43539116 #>>43540011 #

hansmayer ◴[31 Mar 25 16:18 UTC] No.43536731[source]▶

>>43535919 #

It's so funny when people try to build robots imitating people. I mean part funny, part tragedy of the upcoming bust. The irony being, we would have been better off with an interoperable flight booking API standard which a deterministic headless agent could use to make perfect bookings every single time. There is a reason current user interfaces stem from a scientific discipline once called "Human-Computer Interaction".

replies(3): >>43537033 #>>43537160 #>>43538872 #

TeMPOraL ◴[31 Mar 25 16:56 UTC] No.43537160[source]▶

>>43536731 #

It's a business problem, not a tech problem. We don't have a solution you described because half of the air travel industry relies on things not being interoperable. AI is the solution at the limit, one set of companies selling users the ability to show a middle finger to a much wider set of companies - interoperability by literally having a digital human approximation pretending to be the user.

replies(2): >>43537239 #>>43539204 #

the_snooze ◴[31 Mar 25 17:03 UTC] No.43537239[source]▶

>>43537160 #

I've been a sentient human for at least the last 15 years of tech advancement. Assuming this stuff actually works, it's only a matter of time before these AI services claw back all that value for themselves and hold users and businesses hostage to one another, just like social media and e-commerce before. https://en.wikipedia.org/wiki/Enshittification

Unless these tools can be run locally independent of a service provider, we're just trading one boss for another.

replies(3): >>43537372 #>>43538500 #>>43543039 #

polishdude20 ◴[31 Mar 25 19:01 UTC] No.43538500[source]▶

>>43537239 #

The difference is that social media isn't special because of its hardware or software even. People are stuck on fa ebook because everyone else is on it. It's network effects. LLMs currently have no network effects. Your friends and family aren't "on" chatgpt so why use that over something else?

Once performance of a local setup is on par with online ones or good enough, that'll be game over for them.

replies(1): >>43544361 #

TeMPOraL ◴[01 Apr 25 08:51 UTC] No.43544361[source]▶

>>43538500 #

All it takes is for the "omg AI slop!!111" and "would someone think of my copyrights?" crowd to get their way - resulting in a conventional or legal ban for using AI user-agents on the Internet without express consent of a site/service provider. From there, it will be APIs all over again: much like today, you can't easily pipe your Facebook photo to your OneDrive and make a calendar invite - but you can use (for example) Zapier with Facebook Integration, OneDrive Integration and Google Calendar Integration, we'll end up with LLM/chatbot companies whose main value is in their exclusive set of integrations they offer.

So true, it's not going to be "I use PolishDude20GPTBook because my family and friends are on it". It's going to be, "I use PolishDude20GPTBook because they have contracts with Gazeta.pl, Onet, TVN24, OLX and Allegro, so I can use it to get local news and find best-priced products in a convenient way, whereas I can't use TeMPOraLxity for any of that".

Contracts over APIs, again.

As long as the "think of my copyright / AI slop oneoneone" crowd wins. It must not.

replies(1): >>43554594 #

hansmayer ◴[02 Apr 25 08:08 UTC] No.43554594[source]▶

>>43544361 #

The only reason that there is a "AI-slop-crowd" (as you call it) is that, well...there is a lot of (Gen-)AI slop. If the technology was as miraculous as it has been hyped up for several years now, there would be no such crowd. Everyone would just get on. If a tech just does what it says it does, everyone gets on board. Internet is a great example of this, so were the smartphones after the iPhone moment. There was never an Anti-Internet-Crowd, I wonder why that might be?

replies(1): >>43561998 #

TeMPOraL ◴[02 Apr 25 21:43 UTC] No.43561998[source]▶

>>43554594 #

> There was never an Anti-Internet-Crowd, I wonder why that might be?

You forgot the dotcom boom? :)

Existence of AI slop has nothing to do with whether the tech itself is exceeding or falling short of its hype. It exists because it's good enough for advertising, the cancer on modern society that metastasizes to every new medium and technology, defiling and destroying everything it touches.

replies(1): >>43568273 #

1. hansmayer ◴[03 Apr 25 11:50 UTC] No.43568273[source]▶

>>43561998 #

No, what you mean with dotcom boom? That's a completely wrong comparison. The dotcom-boom had more to do with the credit crunch of 2008, not with the inherent qualities of Internet as a technology. With or without dotcom boom, people continued to use Internet, because it was useful to them. I don't remember anyone was promising us that Internet would be able to do this and that, if only... It just worked and everyone got what it was about, right from the start. Where is that moment with LLMs? Summarising my emails, inventing non-existing functions in my code? Hard pass :)

↑