Building Effective AI Agents

1. simonw ◴[17 Jun 25 18:59 UTC] No.44302601[source]▶

This article remains one of the better pieces on this topic, especially since it clearly defines which definition of "AI agents" they are using at the start! They use: "systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks".

I also like the way they distinguish between "agents" and "workflows", and describe a bunch of useful workflow patterns.

I published some notes on that article when it first came out: https://simonwillison.net/2024/Dec/20/building-effective-age...

A more recent article from Anthropic is https://www.anthropic.com/engineering/built-multi-agent-rese... - "How we built our multi-agent research system". I found this one fascinating, I wrote up a bunch of notes on it here: https://simonwillison.net/2025/Jun/14/multi-agent-research-s...

replies(5): >>44302676 #>>44303599 #>>44304356 #>>44305116 #>>44305898 #

2. juddlyon ◴[17 Jun 25 19:05 UTC] No.44302676[source]▶

>>44302601 (TP) #

Thank you for the extra notes, this is top of mind for me.

3. smoyer ◴[17 Jun 25 20:30 UTC] No.44303599[source]▶

>>44302601 (TP) #

The article on the multi-agent research is awesome. I do disagree with one statement in the building effective AI agents article - building your initial system without a framework sounds nice as an educational endeavor but the first benefit you get from a good framework is the easy ability to try out different (and cross-vendor) LLMs

replies(3): >>44305075 #>>44307235 #>>44309698 #

4. koakuma-chan ◴[17 Jun 25 21:53 UTC] No.44304356[source]▶

>>44302601 (TP) #

Does anyone know which AI agent framework Anthropic uses? It doesn't seem like they ever released one of their own.

replies(2): >>44304965 #>>44305508 #

5. rockwotj ◴[17 Jun 25 23:04 UTC] No.44304965[source]▶

>>44304356 #

Just write the for loop to react to tool calls? It’s not very much code.

replies(1): >>44305510 #

6. miki123211 ◴[17 Jun 25 23:28 UTC] No.44305075[source]▶

>>44303599 #

This is why you use a library (not a framework) that provides an abstraction over different LLMs.

I'm personally a fan of litellm, but I'm sure alternatives exist.

7. swyx ◴[17 Jun 25 23:35 UTC] No.44305116[source]▶

>>44302601 (TP) #

one half of the authors of Building Effective Agents also came by AIE to do a well received talk version of this article: https://www.youtube.com/watch?v=D7_ipDqhtwk

8. ankit219 ◴[18 Jun 25 00:34 UTC] No.44305508[source]▶

>>44304356 #

From what it looks like, it's one main LLM (you are sending query to - orchestrator) which calls other LLMs via tool calls. The tools are capable of calling llms too, and can have specific instructions, but mostly just the orchestrator deciding what they should be researching on, and assigns them specific subqueries. There is a limited depth / levels of search queries too, you should see the prompt they use[1]

One cool example of this in action is seen when you use claude code and ask it to search something. In a verbose setting, it calls an MCP tool to help with search. The tool returns summary of the results with the relevant links (not the raw search result text). A similar method, albeit more robust, is used when Claude is doing deep research as well.

[1]: https://github.com/anthropics/anthropic-cookbook/blob/main/p...

9. koakuma-chan ◴[18 Jun 25 00:35 UTC] No.44305510{3}[source]▶

>>44304965 #

They mentioned hand offs, sub agents, concurrent tool calls, etc. You could write that yourself, but you would be inventing your own framework.

replies(2): >>44307411 #>>44307467 #

10. kodablah ◴[18 Jun 25 01:44 UTC] No.44305898[source]▶

>>44302601 (TP) #

I believe the definition of workflows in this article is inaccurate. Workflows in modern engines do not take predefined code paths, and agents are effectively the same as workflows in these cases. The redefinition of workflows seems to be an attempt to differentiate, but for the most part an agent is nothing more than a workflow that is a loop that dynamically invokes things based on LLM responses. Modern workflow engines are very dynamic.

replies(2): >>44306120 #>>44306707 #

11. sothatsit ◴[18 Jun 25 02:25 UTC] No.44306120[source]▶

>>44305898 #

I think the distinction is more about the "level of railroading".

Workflows have a lot more structure and rules about information and control flow. Agents, on the other hand, are often given a set of tools and a prompt. They are much more free-form.

For example, a workflow might define a fuzzy rule like "if customer issue is refund, go to refund flow," while an agent gets customer service tools and figures out how to handle each case on its own.

To me, this is a meaningful distinction to make. Workflows can be more predictable and reliable. Agents have more freedom and can tackle a greater breadth of tasks.

replies(2): >>44308850 #>>44311442 #

12. simonw ◴[18 Jun 25 04:36 UTC] No.44306707[source]▶

>>44305898 #

You appear to be making the mistake of assuming that the only valid definition for the term "workflow" is the definition used by software such as https://airflow.apache.org/

https://www.merriam-webster.com/dictionary/workflow thinks the word dates back to 1921.

There no reason Anthropic can't take that word and present their own alternative definition for it in the context of LLM tool usage, which is what they've done here.

replies(1): >>44311420 #

13. XenophileJKO ◴[18 Jun 25 06:43 UTC] No.44307235[source]▶

>>44303599 #

Having built several systems serving massive user bases with LLMs. I think the ability to swap out APIs just isn't the bottleneck.. like ever. It is always the behavioral issues or capability differences between models.

The frameworks just usually add more complexity, obscurity, and API misalignment.

Now the equation can change IF you are getting a lot of observability, experimentation, etc. I think we are just reaching that point of utility where it is a real question whether you should use the framework by default.

For example I build a first version of a product with my own java code hooking right into an API. I was able to deliver the product quickly with a clean architecture and observability, etc. Then once the internal ecosystem was aligned on a framework (on mentioned in the article) a team took up migrating it to python on the framework. It still isn't complete, it just introduces a lot of abstraction layers where you have to adapt them to your internal systems and your internal observability setup, and any other things that the rest of your applications do.

People underestimate that cost. So by default to get your V0 product off the ground (if you are not a complete startup), just use the API. That is my advice.

replies(3): >>44307883 #>>44307887 #>>44310151 #

14. risyachka ◴[18 Jun 25 07:18 UTC] No.44307411{4}[source]▶

>>44305510 #

Its still just a loop.

Also - funny enough how “parallel calls” became a feature in AI? Like wow, yeah, we could call functions in parallel since the dawn of CS

15. crazylogger ◴[18 Jun 25 07:29 UTC] No.44307467{4}[source]▶

>>44305510 #

Sub-agent is another LLM loop that you simply import and provide as a tool to your orchestrator LLM. For example in Claude Code, sub-agent is a tool called "Task(<description>)" made available to the main LLM (the one that you chat with) along with other tools like patch_file and web_search.

Concurrent tool call is when LLM writes multiple tool calls instead of one, and you can program your app to execute those sequentially or concurrently. This is a trivial concept.

The "agent framework" layer here is so thin it might as well don't exist, and you can use Anthropic/OAI's sdk directly. I don't see a need for fancy graphs with circles here.

replies(1): >>44308812 #

16. davedx ◴[18 Jun 25 08:35 UTC] No.44307883{3}[source]▶

>>44307235 #

This aligns with my experience (specifically with langgraph). I actually find it a depressing sign of the times that your prototype was in Java and the "production" version is going to be in python.

My experience with langgraph is you spend so much time just fixing stupid runtime type errors because the state of every graph is a stupid JSON blob with very minimal typing, and it's so hard figuring out how data moves through the system. Combined with python's already weak type support, and the fact you're usually dealing with long running processes where things break mid- or end- of process, development becomes quite awful. AI coding assistants only help so much. Tests are hard to write because these frameworks inevitably lean in to the dynamic nature of python.

I just can't understand why people are choosing to build these huge complex systems in an untyped language when the only AI or ML is API calls... or very occasionally doing some lightweight embeddings.

17. ◴[18 Jun 25 08:36 UTC] No.44307887{3}[source]▶

>>44307235 #

18. koakuma-chan ◴[18 Jun 25 11:12 UTC] No.44308812{5}[source]▶

>>44307467 #

> The "agent framework" layer here is so thin it might as well don't exist

There's plenty of things that you need to make an AI agent that I woudn't want to re-implement or copy and paste each time. The most annoying being automatic conversation history summarization (e.g. I accidentally wasted $60 with the latest OpenAI realtime model, because the costs go up very quickly as the conversation history grows). And I'm sure we'll discover more things like that in the future.

replies(1): >>44308883 #

19. gwd ◴[18 Jun 25 11:18 UTC] No.44308850{3}[source]▶

>>44306120 #

Just to emphasize your point, below is a workflow I wrote for an LLM recently, to do language tagging (e.g., of vocab, grammar structures, etc). It's very different than what you'd think of as an "agent", where the LLM has tools and can take initiative.

LLMs are amazingly powerful in some ways, but without this kind of "scaffolding", simply not reliable enough to make consistent choices.

---

1. Here are: a) a "language schema" describing what kinds of tags I want and why, with examples, b) The text I want you to tag c) A list of previously-defined tags which could potentially be relevant (simple string match)

List for yourself which pre-existing tags you plan to use when doing tagging.

[LLM generates a list of tags]

2. Here is a,b,c from above, and d) your own tag list

Please write a draft tag.

[LLM writes a draft]

3. Here is a-d from above, plus e) your first draft, and f) Some programmatically-generated "linter" warnings which may or may not be violations of the schema.

Please check over your draft to make sure it follows the schema.

[LLM writes a new draft]

Agent checks for "hard" rules, like making sure there's a 1-1 correlation between the text and the tags. If no rules are violated move to step 5.

4. Here is a-e from above, plus g) your most recent draft, and h) known rule violations. Please fix the errors.

[LLM writes a new draft]

Repeat 4 until no hard rules are broken.

5. [and so on]

replies(1): >>44311446 #

20. akadeb ◴[18 Jun 25 11:24 UTC] No.44308883{6}[source]▶

>>44308812 #

I would highly recommend gemini 2.5 pro too for their speech quality. It's priced lower and the quality is top notch on their API. I made an implementation here in case you're interested https://www.github.com/akdeb/ElatoAI but its on hardware so maybe not totally relevant

replies(1): >>44308949 #

21. koakuma-chan ◴[18 Jun 25 11:33 UTC] No.44308949{7}[source]▶

>>44308883 #

I'm using LiveKit, and I indeed have tested Gemini, but it appears to be broken or at least incompatible with OpenAI. Not sure if this is a Livekit issue or a Gemini issue. Anyway I decided to go back to just using LLM, SST and TTS as separate nodes, but I've also been looking into Deepgram Voice Agent API, but LiveKit doesn't support it (yet?).

22. retinaros ◴[18 Jun 25 13:25 UTC] No.44309698[source]▶

>>44303599 #

not only that it also ready you for production if the framework has constructs like observability, eval, deployment, cloud security , ect...

23. IanCal ◴[18 Jun 25 14:20 UTC] No.44310151{3}[source]▶

>>44307235 #

> I think the ability to swap out APIs just isn't the bottleneck.. like ever

It's a massive pain in the arse for testing though. Checking which out of X number of things performs the best for your use case is quite annoying if you have to have X implementations. Having one set that you swap out keys and some vars makes this massively easier.

replies(1): >>44398692 #

24. kodablah ◴[18 Jun 25 16:46 UTC] No.44311420{3}[source]▶

>>44306707 #

Right, I am saying I don't think their definition is an accurate one with the modern use of the term. It's an artificially limited definition to fit a narrative. An agent is nothing more than a very limited workflow.

25. kodablah ◴[18 Jun 25 16:48 UTC] No.44311442{3}[source]▶

>>44306120 #

> Agents, on the other hand, are often given a set of tools and a prompt. They are much more free-form.

This defines how workflows are used with modern systems in my experience. Workflows are often not predictable, they often execute one of a set of tools based on a response from a previous invocation (e.g. an LLM call).

26. ◴[18 Jun 25 16:49 UTC] No.44311446{4}[source]▶

>>44308850 #

27. kfajdsl ◴[27 Jun 25 17:36 UTC] No.44398692{4}[source]▶

>>44310151 #

This is easily solved with a thin wrapper for calling openai/anthropic/google/whatever that has the same interface between model providers (except for unique capabilities). You don't need a whole framework for this.