Most active commenters

simonw(6)
daxfohl(4)
koakuma-chan(4)
(4)
kodablah(3)
suninsight(3)

Popular/hot comments

>>44303213 #
>>44302601 #
>>44307742 #
>>44303599 #
>>44304050 #
>>44306135 #
>>44307235 #
>>44308731 #

Building Effective AI Agents

(www.anthropic.com)

1. spenczar5 ◴[17 Jun 25 18:54 UTC] No.44302545[source]▶

>>44301809 (OP) #

(December 2024, which somehow feels an eternity ago)

replies(2): >>44302774 #>>44305473 #

2. simonw ◴[17 Jun 25 18:59 UTC] No.44302601[source]▶

>>44301809 (OP) #

This article remains one of the better pieces on this topic, especially since it clearly defines which definition of "AI agents" they are using at the start! They use: "systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks".

I also like the way they distinguish between "agents" and "workflows", and describe a bunch of useful workflow patterns.

I published some notes on that article when it first came out: https://simonwillison.net/2024/Dec/20/building-effective-age...

A more recent article from Anthropic is https://www.anthropic.com/engineering/built-multi-agent-rese... - "How we built our multi-agent research system". I found this one fascinating, I wrote up a bunch of notes on it here: https://simonwillison.net/2025/Jun/14/multi-agent-research-s...

replies(5): >>44302676 #>>44303599 #>>44304356 #>>44305116 #>>44305898 #

3. gregorymichael ◴[17 Jun 25 19:01 UTC] No.44302630[source]▶

>>44301809 (OP) #

One of my favorite AI How-tos in the last year. Barry and Erik spend 80% of the post saying ~”eh, you probably don’t need agents. Just build straightforward deterministic workflows with if-statements instead.”

And then, when you actually do need agents, don’t over complicate it!

This post also introduced the concept of an Augmented LLM — a LLM hooked up to tools, memory, data — which is a useful abstraction for evolving LLM use beyond fancy autocomplete.

“An augmented LLM running in a loop” is the best definition of an agent I’ve heard so far.

4. juddlyon ◴[17 Jun 25 19:05 UTC] No.44302676[source]▶

>>44302601 #

Thank you for the extra notes, this is top of mind for me.

5. suyash ◴[17 Jun 25 19:10 UTC] No.44302730[source]▶

>>44301809 (OP) #

I think the Agent hype has come down now

replies(1): >>44303204 #

6. nahsra ◴[17 Jun 25 19:14 UTC] No.44302774[source]▶

>>44302545 #

Yes, but it's held up really well in my opinion! I use this piece constantly as a reference and I don't feel it's aged. It reframed Anthropic as "the practical partner" in the development of AI tools.

7. revskill ◴[17 Jun 25 19:33 UTC] No.44302965[source]▶

>>44301809 (OP) #

So an agent is just a monoid in the category of monads ?

8. kevinventullo ◴[17 Jun 25 19:53 UTC] No.44303204[source]▶

>>44302730 #

Now it’s all about AI Agencies

9. AvAn12 ◴[17 Jun 25 19:54 UTC] No.44303213[source]▶

>>44301809 (OP) #

How do agents deal with task queueing, race conditions, and other issues arising from concurrency? I see lots of cool articles about building workflows of multiple agents - plus what feels like hand-waving around declaring an orchestrator agent to oversee the whole thing. And my mind goes to whether there needs to be some serious design considerations and clever glue code. Or does it all work automagically?

replies(7): >>44303413 #>>44303510 #>>44303611 #>>44303637 #>>44303642 #>>44304027 #>>44304092 #

10. deadbabe ◴[17 Jun 25 20:01 UTC] No.44303301[source]▶

>>44301809 (OP) #

When an AI agents completes a task, why not have the AI agent save the workflow used to accomplish that task so the next time it sees a similar input it feeds it to a predefined series of tools to avoid any LLM decision making in between tool calls?

And then eventually, with enough sample inputs, create simple functions that can recognize what tools should be used to process a type of input? And only fallback to an LLM agent if the input is novel?

replies(1): >>44304138 #

11. cmsparks ◴[17 Jun 25 20:11 UTC] No.44303413[source]▶

>>44303213 #

Frankly, it's pretty difficult. Though, I've found that the actor model maps really well onto building agents. An instance of an actor = an instance of an agent. Agent to agent communication is just tool calling (via MCP or some other RPC)

I use Cloudflare's Durable Objects (disclaimer: I'm biased, I work on MCP + Agent things @ Cloudflare). However, I figure building agents probably maps similarly well onto any actor style framework.

replies(1): >>44303617 #

12. simonw ◴[17 Jun 25 20:21 UTC] No.44303510[source]▶

>>44303213 #

The standard for "agents" is that tools run in sequence, so no need to worry about concurrency. Several models support parallel tool calls now where the model can say "Run these three tools" and your harness can chose to run them in parallel or sequentially before passing the results back to the model as the next step in the conversation.

Anthropic are leaning more into multi-agent setups where the parent agent might delegate to one or more sub-agents which might run in parallel. They use that trick for Claude Code - I have some notes on reverse-engineering that here https://simonwillison.net/2025/Jun/2/claude-trace/ - and expand on that in their write-up of how Claude Research works: https://simonwillison.net/2025/Jun/14/multi-agent-research-s...

It's still _very_ early in figuring out good patterns for LLM tool-use - the models only got really great at using tools in about the past 6 months, so there's plenty to be discovered about how best to orchestrate them.

replies(2): >>44304116 #>>44304552 #

13. smoyer ◴[17 Jun 25 20:30 UTC] No.44303599[source]▶

>>44302601 #

The article on the multi-agent research is awesome. I do disagree with one statement in the building effective AI agents article - building your initial system without a framework sounds nice as an educational endeavor but the first benefit you get from a good framework is the easy ability to try out different (and cross-vendor) LLMs

replies(3): >>44305075 #>>44307235 #>>44309698 #

14. gk1 ◴[17 Jun 25 20:31 UTC] No.44303611[source]▶

>>44303213 #

In at least the case for coding agents the emerging pattern is to have the agents use containers for isolating work and git for reviewing and merging that work neatly.

See for example the container use MCP which combines both: https://github.com/dagger/container-use

That’s for parallelizing coding work… I’m not sure about other kinds of work. I still see people using workflow builder tools like n8n, Zapier, and maybe CrewAI.

15. pyman ◴[17 Jun 25 20:32 UTC] No.44303617{3}[source]▶

>>44303413 #

Should the people developing AI agent protocols be exploring decentralised architectures, using technologies like blockchain and peer-to-peer networks to distribute models and data? What are the trade-offs of relying on centralised orchestration platforms owned by large companies like Amazon, Cloudfare or NVIDIA? Thanks

replies(1): >>44304163 #

16. daxfohl ◴[17 Jun 25 20:35 UTC] No.44303637[source]▶

>>44303213 #

Nothing works automagically. You still have to build in all the operational characteristics that you would for any traditional system. It's deceptively easy to look at some AI agent demos and think "oh, I can replace my team's huge mess of spaghetti code with a few clever AI prompts!" And it may even work for the first couple use cases. But all that code is there for a reason, and eventually it'll have to be reckoned with. Once you get to the point where you're translating all that code directly into the AI prompt and hoping for no hallucinations, you know you've lost the plot.

replies(1): >>44306135 #

17. nurettin ◴[17 Jun 25 20:35 UTC] No.44303642[source]▶

>>44303213 #

If I had to deal with "AI agent concurrency", I would get them to submit their requests to a queue and process those sequentially.

18. iLoveOncall ◴[17 Jun 25 20:56 UTC] No.44303865[source]▶

>>44301809 (OP) #

> These frameworks make it easy to get started by simplifying standard low-level tasks like calling LLMs, defining and parsing tools, and chaining calls together. However, they often create extra layers of abstraction that can obscure the underlying prompts and responses, making them harder to debug. They can also make it tempting to add complexity when a simpler setup would suffice.

> We suggest that developers start by using LLM APIs directly

Best advice of the whole article by far.

It's insane that people use whole frameworks to send what is essentially an array of strings to a webservice.

We've removed LangChain and LangGraph from our project at work because they are literally worthless, just adding complexity and making you write MORE code than if you didn't use them because you have to deal with their whole boilerplate.

replies(1): >>44305378 #

19. btbuildem ◴[17 Jun 25 20:58 UTC] No.44303885[source]▶

>>44301809 (OP) #

> use simple, composable patterns

It's somehow incredibly reassuring that the "do one thing and do it well" maxim has held up over decades. Composability ftw.

20. 0x457 ◴[17 Jun 25 21:13 UTC] No.44304027[source]▶

>>44303213 #

I can only talk about Codex web interface, I had a very detailed refactoring plan for a project it was too long to complete in one go, so used "ask" feature to split it up into multiple task and group them by "which tasks can be executed concurrently".

It split them up in a way they would be split up in real life, but in real life there is an assumption that people working on tasks going to communicate with each other. The way it generates tasks resulted in HUGE loss of context (my plan was hella detailed).

I was willing to spend a few more hours trying to make it work rather than doing the work myself. I've opened another chat and split it up into multiple sequential tasks, with a detailed prompt for each task (why, what, how, validation, update documentation reminder etc).

Anyway, orchestrator might work on some super simple tasks, much smaller tasks than those articles make you believe.

21. chaosprint ◴[17 Jun 25 21:15 UTC] No.44304050[source]▶

>>44301809 (OP) #

Half a year has passed, and it feels like a long time in the field of AI. I read this article repeatedly a few months ago, but now I think the development of Agent has obviously reached a bottleneck. Even the latest gemini seems to have regressed.

replies(3): >>44304076 #>>44304162 #>>44304346 #

22. m3kw9 ◴[17 Jun 25 21:18 UTC] No.44304076[source]▶

>>44304050 #

They have hard time solving prompt issues injection and that’s a one of the bottle necks

23. rdedev ◴[17 Jun 25 21:20 UTC] No.44304092[source]▶

>>44303213 #

This is why I am leaning towards making the llm generate code that calls operates on took calls instead of having everything in JSON.

Huggingfaces's smolagents library makes the llm generate python code where tools are just normal python functions. If you want parallel tools calls just prompt the llm to do so. It should take care of synchronizing everything. Ofcourse there is the whole issue around executing llm generated code but we have a few solutions for that

24. svachalek ◴[17 Jun 25 21:23 UTC] No.44304116{3}[source]▶

>>44303510 #

I'm not sure we're at "great" yet. Gemini 2.5 pro fails maybe 50% of the time for me at even generating a syntactically successful tool call.

replies(1): >>44304768 #

25. 0x457 ◴[17 Jun 25 21:26 UTC] No.44304138[source]▶

>>44303301 #

You somewhat can do this. I use neo4j as a knowledge database for agents, and it has processes and tasks described.

26. mellosouls ◴[17 Jun 25 21:27 UTC] No.44304151[source]▶

>>44301809 (OP) #

Discussed at the time:

https://news.ycombinator.com/item?id=42470541

Building Effective "Agents", 763 points, 124 comments

27. EGreg ◴[17 Jun 25 21:29 UTC] No.44304162[source]▶

>>44304050 #

What exactly makes them regress?

Why can’t they just fork swarms of themselves, work 24/7 in parallel, check work and keep advancing?

replies(1): >>44304168 #

28. daxfohl ◴[17 Jun 25 21:30 UTC] No.44304163{4}[source]▶

>>44303617 #

That's more of a hobbyist thing I'd say. Corporations developing these things will of course want to use some centralized system that they trust. It's more efficient, they have more control over it, it's easier for average people to use, etc.

A decentralized thing would be more for individuals who want more control and transparency. A decentralized public ledger would make it possible to verify that your agent, the agents it interacts with, and the contents of their interactions have not been altered or compromised in any way, whereas a corporate-owned framework could not provide the same level of assurance.

But technically, there's no advantage I can think of for using a public distributed ledger to manage interactions. Agent tasks are pretty ephemeral, so unlike digital currency, there's not really a need to maintain a complete historical log of every action forever. And as far as providing tools for dealing with race conditions, blockchain would be about the least efficient way of creating a mutex imaginable. So technically, just like with non-AI apps, cetralized architecture is always going to be a lot more efficient.

replies(1): >>44304485 #

29. amelius ◴[17 Jun 25 21:31 UTC] No.44304168{3}[source]▶

>>44304162 #

Because they are not intelligent. (And this is a good definition of it).

replies(1): >>44310028 #

30. jsemrau ◴[17 Jun 25 21:51 UTC] No.44304346[source]▶

>>44304050 #

(1) Running multiple agents is expensive, decreasing RoI. My DeepSearch agent for stocks uses 6 agents, and each query costs about 2 USD.

(2) Multi-agent orchestration is difficult to control.

(3) The more capable the model, the lower the need for multi-agents.

(4) The less capable the model, the higher the business case for narrow AI.

31. koakuma-chan ◴[17 Jun 25 21:53 UTC] No.44304356[source]▶

>>44302601 #

Does anyone know which AI agent framework Anthropic uses? It doesn't seem like they ever released one of their own.

replies(2): >>44304965 #>>44305508 #

32. pyman ◴[17 Jun 25 22:07 UTC] No.44304485{5}[source]▶

>>44304163 #

Good points. I agree that for most companies using centralised systems offers more advantages because of efficiency, control and user experience, but I wasn't arguing that decentralisation is better technically, just wondering if it might be necessary in the long run.

If agents become more autonomous and start coordinating across platforms owned by different companies, it might make sense to have some kind of shared, trustless layer (maybe not blockchain but something distributed, auditable and neutral).

I agree that agent tasks are ephemeral, but what about long lived multi-agent workflows or contracts between agents that execute over time? In those cases transparency and integrity might matter more.

I don't think it's one or the other. Centralised systems will dominate in the short term, no doubt about that, but if we're serious about agent ecosystems at scale, we might need more open coordination models too.

replies(2): >>44304722 #>>44311360 #

33. jsemrau ◴[17 Jun 25 22:15 UTC] No.44304552{3}[source]▶

>>44303510 #

"The standard for "agents" is that tools run in sequence"

I don't think that this correct. Agents benefit is that they can use tools on the fly. Ideally the right tool at the right time.

I.e., Which number is bigger 9.11 or 9.9 -> Agent uses calculator tool. or What is the annual 2020-2023 revenue for Apple -> Financial Statements MCP

replies(1): >>44304931 #

34. ◴[17 Jun 25 22:35 UTC] No.44304722{6}[source]▶

>>44304485 #

35. NetRunnerSu ◴[17 Jun 25 22:36 UTC] No.44304728[source]▶

>>44301809 (OP) #

The entire discussion around agent orchestration, whether centralized or multi-agent, seems to miss the long-term economic reality. We're debating architectural patterns, but the real question is who pays for the agent's continuous existence.

Today, it's about API calls and compute. Tomorrow, for any truly autonomous, long-lived agent, it will be about a continuous "existence tax" levied by the platform owner. The orchestrator isn't just a technical component; it's a landlord.

The alternative isn't a more complex framework. It's a permissionless execution layer—a digital wilderness where an agent's survival depends on its own resources, not a platform's benevolence. The debate isn't about efficiency; it's about sovereignty.

replies(2): >>44304756 #>>44304844 #

36. simonw ◴[17 Jun 25 22:40 UTC] No.44304756[source]▶

>>44304728 #

Which definition of "AI agent" are you talking about here? This sounds like some kind of replacement for a human in a position of authority?

37. simonw ◴[17 Jun 25 22:41 UTC] No.44304768{4}[source]▶

>>44304116 #

Are you using Gemini's baked in API tool calling mechanisms or are you prompting it and telling it to produce specific XML/JSON?

replies(1): >>44304970 #

38. sixhobbits ◴[17 Jun 25 22:48 UTC] No.44304844[source]▶

>>44304728 #

this is just AI slop, what's the point of posting stuff like this here?

39. bredren ◴[17 Jun 25 22:53 UTC] No.44304888[source]▶

>>44301809 (OP) #

It’s helpful but I think Anthropic should be offering non technical versions of this.

For example, a marketing group is interested in agents but needs a guide on how to spec them at a basic level.

There is a figure toward the end and an appendix that starts to drive at this.

Even though it’s new, “how to build them” is an implementation concern.

40. samtheprogram ◴[17 Jun 25 22:59 UTC] No.44304931{4}[source]▶

>>44304552 #

Nothing you said contradicts the quote. When they say in sequence, they don’t mean “in a previously defined order”, they mean “not in parallel”.

41. rockwotj ◴[17 Jun 25 23:04 UTC] No.44304965{3}[source]▶

>>44304356 #

Just write the for loop to react to tool calls? It’s not very much code.

replies(1): >>44305510 #

42. mediaman ◴[17 Jun 25 23:05 UTC] No.44304970{5}[source]▶

>>44304768 #

What do you recommend for this? I've actually had good luck having them create XML, even though you're "supposed" to use the native tool calling in a JSON schema. There seems to be far fewer issues with getting JSON syntax correct.

replies(1): >>44305191 #

43. miki123211 ◴[17 Jun 25 23:28 UTC] No.44305075{3}[source]▶

>>44303599 #

This is why you use a library (not a framework) that provides an abstraction over different LLMs.

I'm personally a fan of litellm, but I'm sure alternatives exist.

44. swyx ◴[17 Jun 25 23:35 UTC] No.44305116[source]▶

>>44302601 #

one half of the authors of Building Effective Agents also came by AIE to do a well received talk version of this article: https://www.youtube.com/watch?v=D7_ipDqhtwk

45. simonw ◴[17 Jun 25 23:45 UTC] No.44305191{6}[source]▶

>>44304970 #

I'm using their native tool calling: https://github.com/simonw/llm-gemini/commit/a7f1096cfbb73301... - it's been working really well for me so far.

46. fennecbutt ◴[18 Jun 25 00:12 UTC] No.44305378[source]▶

>>44303865 #

I suppose langflow also falls into this bucket.

I still think it has a definite use case in regularising all of your various flows into a common format.

Sure, I could write some code to get SD to do all the steps to generate an image, or write some shader code. But it's so much more organised to use comfy-UI, or a shader graph, especially if I have n>1 flows/tasks, and definitely while experimenting with what I'm building.

47. bgwalter ◴[18 Jun 25 00:22 UTC] No.44305434[source]▶

>>44301809 (OP) #

They are so desperate that they start writing about LLM patterns now. Is an agentic LLM framework a Code Factory? Or perhaps a Code Factory Factory?

Or is it like a burrito (meme explanation of Monads when they were the latest hype)?

replies(1): >>44306539 #

48. nico ◴[18 Jun 25 00:27 UTC] No.44305473[source]▶

>>44302545 #

> Nooooo I'm going to have to use my brain again and write 100% of my code like a caveman from December 2024

https://news.ycombinator.com/item?id=44260988

49. ankit219 ◴[18 Jun 25 00:34 UTC] No.44305508{3}[source]▶

>>44304356 #

From what it looks like, it's one main LLM (you are sending query to - orchestrator) which calls other LLMs via tool calls. The tools are capable of calling llms too, and can have specific instructions, but mostly just the orchestrator deciding what they should be researching on, and assigns them specific subqueries. There is a limited depth / levels of search queries too, you should see the prompt they use[1]

One cool example of this in action is seen when you use claude code and ask it to search something. In a verbose setting, it calls an MCP tool to help with search. The tool returns summary of the results with the relevant links (not the raw search result text). A similar method, albeit more robust, is used when Claude is doing deep research as well.

[1]: https://github.com/anthropics/anthropic-cookbook/blob/main/p...

50. koakuma-chan ◴[18 Jun 25 00:35 UTC] No.44305510{4}[source]▶

>>44304965 #

They mentioned hand offs, sub agents, concurrent tool calls, etc. You could write that yourself, but you would be inventing your own framework.

replies(2): >>44307411 #>>44307467 #

51. evertedsphere ◴[18 Jun 25 00:40 UTC] No.44305547[source]▶

>>44301809 (OP) #

in case someone from anthropic is reading this: could you please add a bit of padding on the outside of the page? at least on a phone screen, the text covers the entire width of the screen from edge to edge

52. kodablah ◴[18 Jun 25 01:44 UTC] No.44305898[source]▶

>>44302601 #

I believe the definition of workflows in this article is inaccurate. Workflows in modern engines do not take predefined code paths, and agents are effectively the same as workflows in these cases. The redefinition of workflows seems to be an attempt to differentiate, but for the most part an agent is nothing more than a workflow that is a loop that dynamically invokes things based on LLM responses. Modern workflow engines are very dynamic.

replies(2): >>44306120 #>>44306707 #

53. sothatsit ◴[18 Jun 25 02:25 UTC] No.44306120{3}[source]▶

>>44305898 #

I think the distinction is more about the "level of railroading".

Workflows have a lot more structure and rules about information and control flow. Agents, on the other hand, are often given a set of tools and a prompt. They are much more free-form.

For example, a workflow might define a fuzzy rule like "if customer issue is refund, go to refund flow," while an agent gets customer service tools and figures out how to handle each case on its own.

To me, this is a meaningful distinction to make. Workflows can be more predictable and reliable. Agents have more freedom and can tackle a greater breadth of tasks.

replies(2): >>44308850 #>>44311442 #

54. whattheheckheck ◴[18 Jun 25 02:28 UTC] No.44306135{3}[source]▶

>>44303637 #

Then wtf is the point of this?

replies(3): >>44309003 #>>44311070 #>>44313336 #

55. Zaylan ◴[18 Jun 25 02:47 UTC] No.44306241[source]▶

>>44301809 (OP) #

This article is a good reminder to start with the simplest thing that works and only add complexity when it's truly needed.

A few clearly defined LLM calls with some light glue logic usually lead to something more stable, easier to debug, and much cheaper to run. The flashy, full-featured agents often end up causing more problems than they solve.

56. guicen ◴[18 Jun 25 03:03 UTC] No.44306314[source]▶

>>44301809 (OP) #

I like how this post avoids the hype and gets practical. Too often, people jump straight into building agent systems just because it's trendy, without asking if the task really needs it.

57. emeriezaiya ◴[18 Jun 25 03:43 UTC] No.44306473[source]▶

>>44301809 (OP) #

I hope that AI is something that brings people help

58. ivape ◴[18 Jun 25 03:58 UTC] No.44306539[source]▶

>>44305434 #

You are correct. Just about everyone trying to codify the patterns without at least giving it a few years is a doing everyone a disservice (looking at you Langchain).

59. simonw ◴[18 Jun 25 04:36 UTC] No.44306707{3}[source]▶

>>44305898 #

You appear to be making the mistake of assuming that the only valid definition for the term "workflow" is the definition used by software such as https://airflow.apache.org/

https://www.merriam-webster.com/dictionary/workflow thinks the word dates back to 1921.

There no reason Anthropic can't take that word and present their own alternative definition for it in the context of LLM tool usage, which is what they've done here.

replies(1): >>44311420 #

60. ◴[18 Jun 25 06:41 UTC] No.44307225[source]▶

>>44301809 (OP) #

61. XenophileJKO ◴[18 Jun 25 06:43 UTC] No.44307235{3}[source]▶

>>44303599 #

Having built several systems serving massive user bases with LLMs. I think the ability to swap out APIs just isn't the bottleneck.. like ever. It is always the behavioral issues or capability differences between models.

The frameworks just usually add more complexity, obscurity, and API misalignment.

Now the equation can change IF you are getting a lot of observability, experimentation, etc. I think we are just reaching that point of utility where it is a real question whether you should use the framework by default.

For example I build a first version of a product with my own java code hooking right into an API. I was able to deliver the product quickly with a clean architecture and observability, etc. Then once the internal ecosystem was aligned on a framework (on mentioned in the article) a team took up migrating it to python on the framework. It still isn't complete, it just introduces a lot of abstraction layers where you have to adapt them to your internal systems and your internal observability setup, and any other things that the rest of your applications do.

People underestimate that cost. So by default to get your V0 product off the ground (if you are not a complete startup), just use the API. That is my advice.

replies(3): >>44307883 #>>44307887 #>>44310151 #

62. risyachka ◴[18 Jun 25 07:18 UTC] No.44307411{5}[source]▶

>>44305510 #

Its still just a loop.

Also - funny enough how “parallel calls” became a feature in AI? Like wow, yeah, we could call functions in parallel since the dawn of CS

63. crazylogger ◴[18 Jun 25 07:29 UTC] No.44307467{5}[source]▶

>>44305510 #

Sub-agent is another LLM loop that you simply import and provide as a tool to your orchestrator LLM. For example in Claude Code, sub-agent is a tool called "Task(<description>)" made available to the main LLM (the one that you chat with) along with other tools like patch_file and web_search.

Concurrent tool call is when LLM writes multiple tool calls instead of one, and you can program your app to execute those sequentially or concurrently. This is a trivial concept.

The "agent framework" layer here is so thin it might as well don't exist, and you can use Anthropic/OAI's sdk directly. I don't see a need for fancy graphs with circles here.

replies(1): >>44308812 #

64. suninsight ◴[18 Jun 25 08:13 UTC] No.44307742[source]▶

>>44301809 (OP) #

As someone who works for a company having a real Agent in production, (not a workflow), I cannot disagree more than the very first statement here: Use Agent Frameworks like Langraph. We did exactly that, and had to throw everything away just a month down the line. Then we built everything from scratch and now our system scales pretty well.

To be fair, I think there might be a space for using Agent Frameworks, but the Agent space is too early for a good enough framework to emerge. The semi contrarian though, which I hold to a certain extent, is that the Agent space is moving so fast that a good enough framework might NEVER emerge.

replies(4): >>44307792 #>>44308649 #>>44308879 #>>44309401 #

65. laurentiurad ◴[18 Jun 25 08:17 UTC] No.44307767[source]▶

>>44301809 (OP) #

I used an n8n workflow I developed with one of the exact setups shown in the article. It costs me $3 and at least 3 minutes to get a response to a simple question. No thanks, I am sticking to normal search for the moment.

66. barrenko ◴[18 Jun 25 08:20 UTC] No.44307792[source]▶

>>44307742 #

The event horizon of current AI space has been quite a thing to observe.

67. davedx ◴[18 Jun 25 08:35 UTC] No.44307883{4}[source]▶

>>44307235 #

This aligns with my experience (specifically with langgraph). I actually find it a depressing sign of the times that your prototype was in Java and the "production" version is going to be in python.

My experience with langgraph is you spend so much time just fixing stupid runtime type errors because the state of every graph is a stupid JSON blob with very minimal typing, and it's so hard figuring out how data moves through the system. Combined with python's already weak type support, and the fact you're usually dealing with long running processes where things break mid- or end- of process, development becomes quite awful. AI coding assistants only help so much. Tests are hard to write because these frameworks inevitably lean in to the dynamic nature of python.

I just can't understand why people are choosing to build these huge complex systems in an untyped language when the only AI or ML is API calls... or very occasionally doing some lightweight embeddings.

68. ◴[18 Jun 25 08:36 UTC] No.44307887{4}[source]▶

>>44307235 #

69. weego ◴[18 Jun 25 10:41 UTC] No.44308649[source]▶

>>44307742 #

I'm just in the process of moving from a prototype in N8N's agent tools to an actual system that could be self-hosted.

I've read a lot of comments that most pragmatic shops have dumped langchain/graph, haystack, crew etc for their own internal code that does everything more simply, but I can't currently conceptualize how tooling etc is actually done in the real world.

Do you have any links or docs that you've used as a basis for the work you could share? Thanks.

replies(1): >>44317940 #

70. i_love_retros ◴[18 Jun 25 10:59 UTC] No.44308731[source]▶

>>44301809 (OP) #

Has anyone got an example of an agent doing work in production that is saving the company money and doing a genuinely worthwhile job (in other words it's not writing text that exists purely to fill space on a packet of chips)?

replies(3): >>44309689 #>>44309955 #>>44310935 #

71. koakuma-chan ◴[18 Jun 25 11:12 UTC] No.44308812{6}[source]▶

>>44307467 #

> The "agent framework" layer here is so thin it might as well don't exist

There's plenty of things that you need to make an AI agent that I woudn't want to re-implement or copy and paste each time. The most annoying being automatic conversation history summarization (e.g. I accidentally wasted $60 with the latest OpenAI realtime model, because the costs go up very quickly as the conversation history grows). And I'm sure we'll discover more things like that in the future.

replies(1): >>44308883 #

72. gwd ◴[18 Jun 25 11:18 UTC] No.44308850{4}[source]▶

>>44306120 #

Just to emphasize your point, below is a workflow I wrote for an LLM recently, to do language tagging (e.g., of vocab, grammar structures, etc). It's very different than what you'd think of as an "agent", where the LLM has tools and can take initiative.

LLMs are amazingly powerful in some ways, but without this kind of "scaffolding", simply not reliable enough to make consistent choices.

---

1. Here are: a) a "language schema" describing what kinds of tags I want and why, with examples, b) The text I want you to tag c) A list of previously-defined tags which could potentially be relevant (simple string match)

List for yourself which pre-existing tags you plan to use when doing tagging.

[LLM generates a list of tags]

2. Here is a,b,c from above, and d) your own tag list

Please write a draft tag.

[LLM writes a draft]

3. Here is a-d from above, plus e) your first draft, and f) Some programmatically-generated "linter" warnings which may or may not be violations of the schema.

Please check over your draft to make sure it follows the schema.

[LLM writes a new draft]

Agent checks for "hard" rules, like making sure there's a 1-1 correlation between the text and the tags. If no rules are violated move to step 5.

4. Here is a-e from above, plus g) your most recent draft, and h) known rule violations. Please fix the errors.

[LLM writes a new draft]

Repeat 4 until no hard rules are broken.

5. [and so on]

replies(1): >>44311446 #

73. gwd ◴[18 Jun 25 11:22 UTC] No.44308879[source]▶

>>44307742 #

It sounds like you're agreeing with the article? From TFA:

> Over the past year, we've worked with dozens of teams building large language model (LLM) agents across industries. Consistently, the most successful implementations weren't using complex frameworks or specialized libraries. Instead, they were building with simple, composable patterns.

> ...There are many frameworks that make agentic systems easier to implement. ...These frameworks make it easy to get started by simplifying standard low-level tasks like calling LLMs, defining and parsing tools, and chaining calls together. However, they often create extra layers of abstraction that can obscure the underlying prompts and responses, making them harder to debug. They can also make it tempting to add complexity when a simpler setup would suffice. We suggest that developers start by using LLM APIs directly: many patterns can be implemented in a few lines of code.

74. akadeb ◴[18 Jun 25 11:24 UTC] No.44308883{7}[source]▶

>>44308812 #

I would highly recommend gemini 2.5 pro too for their speech quality. It's priced lower and the quality is top notch on their API. I made an implementation here in case you're interested https://www.github.com/akdeb/ElatoAI but its on hardware so maybe not totally relevant

replies(1): >>44308949 #

75. koakuma-chan ◴[18 Jun 25 11:33 UTC] No.44308949{8}[source]▶

>>44308883 #

I'm using LiveKit, and I indeed have tested Gemini, but it appears to be broken or at least incompatible with OpenAI. Not sure if this is a Livekit issue or a Gemini issue. Anyway I decided to go back to just using LLM, SST and TTS as separate nodes, but I've also been looking into Deepgram Voice Agent API, but LiveKit doesn't support it (yet?).

76. pferde ◴[18 Jun 25 11:43 UTC] No.44309003{4}[source]▶

>>44306135 #

That's the neat part - there is none!

77. i_love_retros ◴[18 Jun 25 12:44 UTC] No.44309401[source]▶

>>44307742 #

What job is the agent performing?

replies(1): >>44317906 #

78. a_bonobo ◴[18 Jun 25 13:24 UTC] No.44309689[source]▶

>>44308731 #

I like ChatIPT! It solves a real challenge with biodiversity data. It doesn't mention the term 'agentic' but there's definitely Python code being written and executed.

https://www.gbif.org/news/6aw2VFiEHYlqb48w86uKSf/chatipt-sys...

It's still in beta.

Press release:

Rukaya Johaadien's chatbot provides conversation-style support to students and researchers who hold biodiversity data but are first-time or infrequent data publishers. Its prompts guide users as it cleans and standardizes spreadsheets, creates basic metadata, and publishes well-structured datasets on GBIF.org as a Darwin Core Archive.

To date, publishing high quality data from PhD and Master's degrees and other small-scale biodiversity research studies has been difficult to do at scale. Standardizing data typically requires specialist knowledge of programming languages, data management techniques, and familiarity with specialist software.

Meanwhile, the process of gaining access to existing instances of the Integrated Publishing Toolkit (IPT)—the GBIF network's workhorse application for data sharing run by node staff with limited time and resources—can test a novice's patience. Training can do little to surmount such logistical barriers and others, like language, when occasional users forget the precise steps and details from year to year.

"Data standardization is hard, and biologists don't become biologists because they like coding or Excel, so a lot of potentially valuable data falls by the wayside," said Johaadien. "Recognizing that large language models have gotten really good at generating code and working with data, I built an automated tool to guide non-technical users through routine questions and process their messy data as much as possible, then publish it quickly and automatically to GBIF."

79. retinaros ◴[18 Jun 25 13:25 UTC] No.44309698{3}[source]▶

>>44303599 #

not only that it also ready you for production if the framework has constructs like observability, eval, deployment, cloud security , ect...

80. heldrida ◴[18 Jun 25 13:58 UTC] No.44309955[source]▶

>>44308731 #

C'mon? There are plenty of 14yo founders on X-Twitter-TikTokers making 40K MRR doing that

81. vonneumannstan ◴[18 Jun 25 14:06 UTC] No.44310028{4}[source]▶

>>44304168 #

How is that a regression?

82. IanCal ◴[18 Jun 25 14:20 UTC] No.44310151{4}[source]▶

>>44307235 #

> I think the ability to swap out APIs just isn't the bottleneck.. like ever

It's a massive pain in the arse for testing though. Checking which out of X number of things performs the best for your use case is quite annoying if you have to have X implementations. Having one set that you swap out keys and some vars makes this massively easier.

replies(1): >>44398692 #

83. lmeyerov ◴[18 Jun 25 15:49 UTC] No.44310935[source]▶

>>44308731 #

For louie.ai, our users are doing agents and agentic reasoning for automating daily investigation work:

1. Agentic Automation: For every alert/ticket coming in, the agent does a pre-investigation across relevant APIs, DBs, etc, helping identify FPs and providing more context on real ones. Cuts down on human time and speeds up handling.

2. Vibes Investigation: The same agentic reasoning is used when spelunking, where beyond just text2sql, the LLM will spin 2-10 minutes to investigate Splunk, databricks, etc for you.

Underneath, the agent has tools like semantic layers over DBs, large logs/text/dataframe analysers, etc .

84. deadbabe ◴[18 Jun 25 16:07 UTC] No.44311070{4}[source]▶

>>44306135 #

Now you’re starting to realize, AI has no real purpose except as a natural language processor for ambiguous unstructured inputs.

Anything an AI agent does that is not that, can be done cheaply and deterministically by some code.

If code can replace humans, it can replace AI.

85. daxfohl ◴[18 Jun 25 16:39 UTC] No.44311360{6}[source]▶

>>44304485 #

My hunch would still be no; human agents are able to cooperate without needing to do everything in a global shared record, so I'd expect AI agents would as well. If you (or any other AI agent) feel the need to check that the AI agent did some task, you just verify it "manually", like add a verification step in the workflow so that your AI agent checks your bank account to verify that the other AI agent actually transferred the sum that they said, just like human-to-human interaction (and just like a non-AI automated workflow would do).

But, that's just a guess. Maybe the combination of AI and automation adds something special to the mix where a global public ledger becomes more valuable (beyond the hobbyist community) and I'm just not seeing it.

86. kodablah ◴[18 Jun 25 16:46 UTC] No.44311420{4}[source]▶

>>44306707 #

Right, I am saying I don't think their definition is an accurate one with the modern use of the term. It's an artificially limited definition to fit a narrative. An agent is nothing more than a very limited workflow.

87. kodablah ◴[18 Jun 25 16:48 UTC] No.44311442{4}[source]▶

>>44306120 #

> Agents, on the other hand, are often given a set of tools and a prompt. They are much more free-form.

This defines how workflows are used with modern systems in my experience. Workflows are often not predictable, they often execute one of a set of tools based on a response from a previous invocation (e.g. an LLM call).

88. ◴[18 Jun 25 16:49 UTC] No.44311446{5}[source]▶

>>44308850 #

89. daxfohl ◴[18 Jun 25 21:15 UTC] No.44313336{4}[source]▶

>>44306135 #

If you're a big software company, not much. If you're a small non-tech business, it could be an easy way to automate some things without hiring a software engineer.

90. suninsight ◴[19 Jun 25 12:12 UTC] No.44317906{3}[source]▶

>>44309401 #

It is AI Software Dev called NonBioS.ai

91. suninsight ◴[19 Jun 25 12:18 UTC] No.44317940{3}[source]▶

>>44308649 #

Most of our stuff is built in house actually, simply because everything else is still kind of catching up. You can find a bunch of information on the blog (https://www.nonbios.ai/blog)

The only software that we use is Langfuse for observability and that too was breaking down for us. But they launched a new version - V3 - which might still work out for us.

I would suggest to just use standard non-AI specific python libraries and build your own systems. If you are migrating from N8N to a self hosted system then you can actually use NonBioS to build it out for you directly. If you join our discord channels, we can get an engineer to help you out also.

92. kfajdsl ◴[27 Jun 25 17:36 UTC] No.44398692{5}[source]▶

>>44310151 #

This is easily solved with a thin wrapper for calling openai/anthropic/google/whatever that has the same interface between model providers (except for unique capabilities). You don't need a whole framework for this.

↑