←back to thread

423 points serjester | 1 comments | | HN request time: 2.761s | source
Show context
simonw ◴[] No.43535919[source]
Yeah, the "book a flight" agent thing is a running joke now - it was a punchline in the Swyx keynote for the recent AI Engineer event in NYC: https://www.latent.space/p/agent

I think this piece is underestimating the difficulty involved here though. If only it was as easy as "just pick a single task and make the agent really good at that"!

The problem is that if your UI involves human beings typing or talking to you in a human language, there is an unbounded set of ways things could go wrong. You can't test against every possible variant of what they might say. Humans are bad at clearly expressing things, but even worse is the challenge of ensuring they have a concrete, accurate mental model of what the software can and cannot do.

replies(12): >>43536068 #>>43536088 #>>43536142 #>>43536257 #>>43536583 #>>43536731 #>>43537089 #>>43537591 #>>43539058 #>>43539104 #>>43539116 #>>43540011 #
noodletheworld ◴[] No.43536088[source]
Isn't the point he's making:

>> Yet too many AI projects consistently underestimate this, chasing flashy agent demos promising groundbreaking capabilities—until inevitable failures undermine their credibility.

This is the problem with the 'MCP for Foo' posts that recently.

Adding a capability to your agent that the agent can't use just gives us exactly that:

> inevitable failures undermine their credibility

It should be relatively easy for everyone to agree that giving agents an unlimited set of arbitrary capabilities will just make them terrible at everything; and that promising that giving them these capabilities will make them better is:

A) false

B) undermining the credibility of agentic systems

C) undermining the credibility of the people making these promises

...I get it, it is hard to write good agent systems, but surely, a bunch of half-baked, function-calling wrappers that don't really work... like, it's not a good look right?

It's just vibe coding for agents.

I think it's quite reasonable to be say, if you're building a system, now, then:

> The key to navigating this tension is focus—choosing a small number of tasks to execute exceptionally well and relentlessly iterating upon them.

^ This seems like exceptionally good advice. If you can't make something that's actually good by iterating on it until it is good and it does work, then you're going to end up being a devin (ie. over promised, over hyped failure).

replies(1): >>43536276 #
1. ◴[] No.43536276[source]