Building Effective "Agents"

1. simonw ◴[20 Dec 24 22:25 UTC] No.42475700[source]▶

This is by far the most practical piece of writing I've seen on the subject of "agents" - it includes actionable definitions, then splits most of the value out into "workflows" and describes those in depth with example applications.

There's also a cookbook with useful code examples: https://github.com/anthropics/anthropic-cookbook/tree/main/p...

Blogged about this here: https://simonwillison.net/2024/Dec/20/building-effective-age...

replies(6): >>42475903 #>>42476486 #>>42477016 #>>42478039 #>>42478786 #>>42479343 #

2. 3abiton ◴[20 Dec 24 22:51 UTC] No.42475903[source]▶

>>42475700 (TP) #

I'm glad they are publishing their cookbooks recipes on github too. Openai used to be more active there.

replies(1): >>42476213 #

3. refulgentis ◴[21 Dec 24 00:21 UTC] No.42476459{3}[source]▶

>>42476213 #

Eh, let's nip this in the bud: we could end up in a "it feels like...", coupled to free association, cycle. :)

More substantively, we can check our vibe. OpenAI is just as active as it ever was w/notebooks. To an almost absurd degree. 5-10 commits a week. https://github.com/openai/openai-cookbook/activity

4. NeutralForest ◴[21 Dec 24 00:26 UTC] No.42476486[source]▶

>>42475700 (TP) #

Thanks for all the write-ups on LLMs, you're on top of the news and it makes it way easier to follow what's happening and the existing implementations by following your blog instead.

replies(1): >>42481308 #

5. th0ma5 ◴[21 Dec 24 02:19 UTC] No.42477016[source]▶

>>42475700 (TP) #

How do you protect from compounding errors?

replies(1): >>42478587 #

6. Animats ◴[21 Dec 24 07:21 UTC] No.42478039[source]▶

>>42475700 (TP) #

Yes, they have actionable definitions, but they are defining something quite different than the normal definition of an "agent". An agent is a party who acts for another. Often this comes from an employer-employee relationship.

This matters mostly when things go wrong. Who's responsible? The airline whose AI agent gave out wrong info about airline policies found, in court, that their "intelligent agent" was considered an agent in legal terms. Which meant the airline was stuck paying for their mistake.

Anthropic's definition: Some customers define agents as fully autonomous systems that operate independently over extended periods, using various tools to accomplish complex tasks.

That's an autonomous system, not an agent. Autonomy is about how much something can do without outside help. Agency is about who's doing what for whom, and for whose benefit and with what authority. Those are independent concepts.

replies(5): >>42478093 #>>42478201 #>>42479305 #>>42480149 #>>42481749 #

7. solidasparagus ◴[21 Dec 24 07:40 UTC] No.42478093[source]▶

>>42478039 #

That's only one of many definitions for the word agent outside of the context of AI. Another is something produces effects on the world. Another is something that has agency.

Sort of interesting that we've coalesced on this term that has many definitions, sometimes conflicting, but where many of the definitions vaguely fit into what an "AI Agent" could be for a given person.

But in the context of AI, Agent as Anthropic defines it is an appropriate word because it is a thing that has agency.

replies(1): >>42478308 #

8. simonw ◴[21 Dec 24 08:17 UTC] No.42478201[source]▶

>>42478039 #

Where did you get the idea that your definition there is the "normal" definition of agent, especially in the context of AI?

I ask because you seem very confident in it - and my biggest frustration about the term "agent" is that so many people are confident that their personal definition is clearly the one everyone else should be using.

replies(3): >>42478826 #>>42478885 #>>42486107 #

9. Animats ◴[21 Dec 24 08:45 UTC] No.42478308{3}[source]▶

>>42478093 #

> But in the context of AI, Agent as Anthropic defines it is an appropriate word because it is a thing that has agency.

That seems circular.

replies(1): >>42478992 #

10. tlarkworthy ◴[21 Dec 24 09:39 UTC] No.42478587[source]▶

>>42477016 #

read the article, close the feedback loop with something verifiable (e.g. tests)

replies(1): >>42481301 #

11. dmezzetti ◴[21 Dec 24 10:29 UTC] No.42478786[source]▶

>>42475700 (TP) #

If you're looking for a lightweight open-source framework designed to handle the patterns mentioned in this article: https://github.com/neuml/txtai

Disclaimer: I'm the author of the framework.

replies(1): >>42481123 #

12. PhilippGille ◴[21 Dec 24 10:37 UTC] No.42478826{3}[source]▶

>>42478201 #

Didn't he mention it was the court's definition?

But I'm not sure if that's true. The court didn't define anything, in contrary they only said that (in simplified terms) the chatbot was part of the website and it's reasonable to expect the info on their website to be accurate.

The closest I could find to the chatbot being considered an agent in legal terms (an entity like an employee) is this:

> Air Canada argues it cannot be held liable for information provided by one of its agents, servants, or representatives – including a chatbot.

Source: https://www.canlii.org/en/bc/bccrt/doc/2024/2024bccrt149/202...

13. JonChesterfield ◴[21 Dec 24 10:49 UTC] No.42478885{3}[source]▶

>>42478201 #

Defining "agent" as "thing with agency" seems legitimate to me, what with them being the same word.

replies(1): >>42479373 #

14. Nevermark ◴[21 Dec 24 11:19 UTC] No.42478992{4}[source]▶

>>42478308 #

It would only be circular if agency was only defined as “the property of being an agent”. That circle of reasoning isn’t being proposed as the formal definitions by anyone.

Perhaps you mean tautological. In which case, an agent having agency would be an informal tautology. A relationship so basic to the subject matter that it essentially must be true. Which would be the strongest possible type of argument.

15. pvg ◴[21 Dec 24 12:34 UTC] No.42479305[source]▶

>>42478039 #

AI people have been using a much broader definition of 'agent' for ages, though. One from Russel and Norvig's 90s textbook:

"Anything that can be viewed as perceiving its environment through sensors and acting upon that environment through actuators"

https://en.wikipedia.org/wiki/Intelligent_agent#As_a_definit...

replies(1): >>42483983 #

16. adeptima ◴[21 Dec 24 12:41 UTC] No.42479343[source]▶

>>42475700 (TP) #

100% agree. I did a research on workflows, durable execution engines in context of Agents and RAGs. Put some links in a comment to the article below

17. simonw ◴[21 Dec 24 12:50 UTC] No.42479373{4}[source]▶

>>42478885 #

That logic doesn't work for me, because many words have multiple meanings. "Agency" can also be a noun that means an organization that you hire - like a design agency. Or it can mean the CIA.

I'm not saying it's not a valid definition of the term, I'm pushing back on the idea that it's THE single correct definition of the term.

replies(2): >>42481146 #>>42481493 #

18. jeffreygoesto ◴[21 Dec 24 15:22 UTC] No.42480149[source]▶

>>42478039 #

And "autonomous" is "having one's own laws".

https://www.etymonline.com/word/autonomous

19. threecheese ◴[21 Dec 24 18:04 UTC] No.42481123[source]▶

>>42478786 #

Hi David; I’ve seen txtai floating around, and just took a look. Would you say that it fits in a similar niche to something like llamaindex, but starting from a data/embeddings abstraction rather than a retrieval one (building on layers from there - like workflows, agents etc)?

replies(1): >>42483033 #

20. Nekit1234007 ◴[21 Dec 24 18:08 UTC] No.42481146{5}[source]▶

>>42479373 #

May I push back on the idea that a single word may mean (completely) different things?

replies(5): >>42481317 #>>42481479 #>>42481545 #>>42482230 #>>42485302 #

21. th0ma5 ◴[21 Dec 24 18:33 UTC] No.42481301{3}[source]▶

>>42478587 #

And who tests the tests, etc

22. th0ma5 ◴[21 Dec 24 18:35 UTC] No.42481308[source]▶

>>42476486 #

Probably the least critical and most myth pushing content imo.

replies(1): >>42481680 #

23. chrisweekly ◴[21 Dec 24 19:01 UTC] No.42481479{6}[source]▶

>>42481146 #

What's the single, unambiguous definition of the word "cleave"?

24. Der_Einzige ◴[21 Dec 24 19:03 UTC] No.42481493{5}[source]▶

>>42479373 #

Anything involving real agents likely does get your local spymaster interested. I assume all good AI work attracts the three letter types to make sure that the researcher isn’t trying to make AI that can make bioweapons…

25. ToValueFunfetti ◴[21 Dec 24 19:11 UTC] No.42481545{6}[source]▶

>>42481146 #

Aloha! Indeed, the language is being cleaved by such oversights. You can be in charge of overlooking this issue, effective ahead of two weeks from now. We'll peruse your results and impassionately sanction anything you call out (at least when it's unravelable). This endeavor should prove invaluable. Aloha!

26. herecomethefuzz ◴[21 Dec 24 19:35 UTC] No.42481680{3}[source]▶

>>42481308 #

> most myth pushing content

Care to elaborate?

replies(1): >>42481986 #

27. jcims ◴[21 Dec 24 19:46 UTC] No.42481749[source]▶

>>42478039 #

>Anthropic's definition: Some customers define agents as fully autonomous systems that operate independently over extended periods, using various tools to accomplish complex tasks.

But that's not their definition, and they explicitly describe that definition as an 'autonomous system'. Their definition comes in the next paragraph:

"At Anthropic, we categorize all these variations as agentic systems, but draw an important architectural distinction between workflows and agents:

* Workflows are systems where LLMs and tools are orchestrated through predefined code paths. Agents, on the other hand, are systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks.

* Agents, on the other hand, are systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks."

28. th0ma5 ◴[21 Dec 24 20:18 UTC] No.42481986{4}[source]▶

>>42481680 #

Lots of lists of the myths of LLMs out there https://masterofcode.com/blog/llms-myths-vs-reality-what-you... Every single post glosses over some aspect of these myths or posits they can be controlled or mitigated in some way, with no examples of anyone else finding applicability of the solutions to real world problems in a supportable and reliable way. When pushed, a myth in the neighborhood of those in the list above is pushed like the system will get better, or some classical computing mechanism will make up the difference, or that the problems aren't so bad, the solution is good enough in some ambiguous way, or that people or existing systems are just as bad when they are not.

replies(1): >>42482144 #

29. simonw ◴[21 Dec 24 20:42 UTC] No.42482144{5}[source]▶

>>42481986 #

I've written extensively about myths and misconceptions about LLMs, much of which overlaps with the observations in that post.

Here's my series about misconceptions: https://simonwillison.net/series/llm-misconceptions/

It doesn't seem to me that you're familiar with my work - you seem to be mixing me in with the vast ocean of uncritical LLM boosting content that's out there.

replies(1): >>42483664 #

30. simonw ◴[21 Dec 24 20:56 UTC] No.42482230{6}[source]▶

>>42481146 #

It's pretty clearly true.

Bank: financial institution, edge of a river, verb to stash something away

Spring: a season, a metal coil, verb to jump

Match: verb to match things together, noun a thing to start fires, noun a competition between two teams

Bat: flying mammal, stick for hitting things

And so on.

31. dmezzetti ◴[21 Dec 24 23:01 UTC] No.42483033{3}[source]▶

>>42481123 #

Hello - This is a great and accurate description. The idea is that out of the box there is a pipeline but each component is also customizable.

32. th0ma5 ◴[22 Dec 24 01:09 UTC] No.42483664{6}[source]▶

>>42482144 #

I'm thinking of the system you built to watch videos and parse JSON and the claims of that having a general suitability, which is simply dishonest imo. You seem to be confusing me with someone that hasn't been asking you repeatedly to address these kinds of concerns and the above series are a kind of potemkin set of things that don't intersect with your other work.

replies(2): >>42487527 #>>42490138 #

33. minasmorath ◴[22 Dec 24 02:26 UTC] No.42483983{3}[source]▶

>>42479305 #

That definition feels like it's playing on the verb, the idea of having "agency" in the world, and not on the noun, of being an "agent" for another party. The former is a philosophical category, while the latter has legal meaning and implication, and it feels somewhat disingenuous to continue to mix them up in this way.

replies(2): >>42484022 #>>42484981 #

34. AnimalMuppet ◴[22 Dec 24 02:36 UTC] No.42484022{4}[source]▶

>>42483983 #

Interesting. The best agents don't have agency, or at least don't use it.

You can think of this in video game terms: Players have agency. NPCs are "agencs", but don't have agency. But they're still not just objects in the game - they can move themselves and react to their environment.

replies(1): >>42486096 #

35. pvg ◴[22 Dec 24 07:55 UTC] No.42484981{4}[source]▶

>>42483983 #

In what way is it 'disingenuous'? You think Norvig is trying to deceive us about something? I'm not saying you have to agree with or like this definition but even if you think it's straight up wrong, 'disingenuous' feels utterly out of nowhere.

replies(1): >>42486056 #

36. rcxdude ◴[22 Dec 24 09:27 UTC] No.42485302{6}[source]▶

>>42481146 #

You're pushing up against the english language, then. 'let' has 46 entries in the dictionary (more if you cinsider obsolete usages).

37. minasmorath ◴[22 Dec 24 13:03 UTC] No.42486056{5}[source]▶

>>42484981 #

It's disingenuous in that it takes a word with a common understanding ("agent") and then conveniently redefines or re-etomologizes the word in an uncommon way that leads people to implicitly believe something about the product that isn't true.

Another great example of this trick is "essential" oils. We all know what the word "essential" means, but the companies selling the stuff use the word in the most uncommon way, to indicate the "essence" of something is in the oil, and then let the human brain fill in the gap and thus believe something that isn't true. It's techinically legal, but we have to agree that's not moral or ethical, right?

Maybe I'm wildly off base here, I have admittedly been wrong about a lot in my life up to this point. I just think the backlash that crops up when people realize what's going on (for example, the airline realizing that their chat bot does not in fact operate under the same rules as a human "agent," and that it's still a technology product) should lead companies to change their messaging and marketing, and the fact that they're just doubling down on the same misleading messaging over and over makes the whole charade feel disingenuous to me.

replies(1): >>42486702 #

38. minasmorath ◴[22 Dec 24 13:16 UTC] No.42486096{5}[source]▶

>>42484022 #

That's actually a great example of what I'm saying, because I don't think the NPCs are agents at all in the traditional sense of "One that acts or has the power or authority to act on behalf of another." Where would the NPC derive its power and authority from? There is a human somewhere in the chain giving it 100% of its parameters, and that human is ultimately 100% responsible for the configuration of the NPC, which is why we don't blame the NPC in the game for behaving in a buggy way, we blame the devs. To say the NPC has agency puts some level of metaphysical responsibility about decision making and culpability on the thing that it doesn't have.

An AI "agent" is the same way, it is not culpable for its actions, the humans who set it up are, but we're leading people to believe that if the AI goes off script then the AI is somehow responsible for its own actions, which is simply not true. These are not autonomous beings, they're technology products.

39. minasmorath ◴[22 Dec 24 13:18 UTC] No.42486107{3}[source]▶

>>42478201 #

I searched for the definition of "agent" and none of the results map to the way AI folks are using the word. It's really that simple, because we're marketing this stuff to non-tech people who already use words to mean things.

If we're redefining common words to market this stuff to non-tech people, and then we're conveniently not telling them that we redefined words, and thus allowing them to believe implicit falsehoods about the product that have serious consequences, we're being disingenuous.

40. pvg ◴[22 Dec 24 14:59 UTC] No.42486702{6}[source]▶

>>42486056 #

with a common understanding ("agent") and then conveniently redefines or re-etomologizes the word in an uncommon way that leads people to implicitly believe something about the product that isn't true.

What is the 'product' here? It's a university textbook. Like, where is the parallel between https://en.wikipedia.org/wiki/Intelligent_agent and 'essential oils'.

replies(1): >>42486840 #

41. minasmorath ◴[22 Dec 24 15:16 UTC] No.42486840{7}[source]▶

>>42486702 #

Oh, I have no issue with his textbook definition, I'm saying that it's now being used to sell products by people who know their normal consumer base isn't using the same definition and it conveniently misleads them into believing things about the product that aren't true.

Knowing that your target market (non-tech folks) isn't using the same language as you, but persisting with that language because it creates convenient sales opportunities due to the misunderstandings, feels disingenuous to me.

An "agent" in common terms is just someone acting on behalf of another, but that someone still has autonomy and moral responsibility for their actions. Like for example the airline customer service representative situation. AI agents, when we pull back the curtains, get down to brass tacks, whatever turn of phrase you want to use, are still ultimately deterministic models. They have a lot more parameters, and their determinism is offset by many factors of pseudo-randomness, but given sufficient information we could still predict every single output. That system cannot be an agent in the common sense of the word, because humans are still dictating all of the possible actions and outcomes, and the machine doesn't actually have the autonomy required.

If you fail to keep your tech product from going off-script, you're responsible, because the model itself isn't a non-deterministic causal actor. A human CSR on the other hand is considered by law to have the power and responsibility associated with being a causal actor in the world, and so when they make up wild stuff about the terms of the agreement, you don't have to honor it for the customer, because there's culpability.

I'm drifting into philosophy at this point, which never goes well on HN, but this is ultimately how our legal system determines responsibility for actions, and AI doesn't meet those qualifications. If we ever want it to be culpable for its own actions, we'll have to change the legal framework we all operate under.

Edit: Causal, not casual... Whoops.

Also, I think I'm confusing the situation a bit by mixing the legal distinctions between agency and autonomy with the common understanding of being an "agent" and the philosophical concept of agency and culpability and how that relates to the US legal foundations.

I need to go touch grass.

42. kordlessagain ◴[22 Dec 24 16:55 UTC] No.42487527{7}[source]▶

>>42483664 #

> dishonest Potemkin

It's like criticizing a "Hello World" program for not having proper error handling and security protocols. While those are important for production systems, they're not the point of a demonstration or learning example.

Your response seems to take these examples and hold them to the standard of mission-critical systems, which is a form of technical gatekeeping - raising the bar unnecessarily high for what counts as a "valid" technical demonstration.

43. simonw ◴[22 Dec 24 23:37 UTC] No.42490138{7}[source]▶

>>42483664 #

You mean this? https://simonwillison.net/2024/Oct/17/video-scraping/

To my surprise, on re-reading that post I didn't mention that you need to double-check everything it does. I guess I forgot to mention that at the time because I thought it was so obvious - anyone who's paying attention to LLMS should already know that you can't trust them to reliably extract this kind of information.

I've mentioned that a lot in my other writing. I frequently tell people that the tricky thing about working with LLMs is learning how to make use of a technology that is inherently unreliable.

Update: added a new note about reliability here: https://simonwillison.net/2024/Oct/17/video-scraping/#a-note...

Second update: I just noticed that I DID say "You should never trust these things not to make mistakes, so I re-watched the 35 second video and manually checked the numbers. It got everything right." in that post already.

> You seem to be confusing me with someone that hasn't been asking you repeatedly to address these kinds of concerns

Where did you do that?