Most active commenters

abletonlive(3)
adastra22(3)

The unreasonable effectiveness of an LLM agent loop with tool use

(sketch.dev)

Show context

libraryofbabel ◴[15 May 25 20:36 UTC] No.43999072[source]▶

Strongly recommend this blog post too which is a much more detailed and persuasive version of the same point. The author actually goes and builds a coding agent from zero: https://ampcode.com/how-to-build-an-agent

It is indeed astonishing how well a loop with an LLM that can call tools works for all kinds of tasks now. Yes, sometimes they go off the rails, there is the problem of getting that last 10% of reliability, etc. etc., but if you're not at least a little bit amazed then I urge you go to and hack together something like this yourself, which will take you about 30 minutes. It's possible to have a sense of wonder about these things without giving up your healthy skepticism of whether AI is actually going to be effective for this or that use case.

This "unreasonable effectiveness" of putting the LLM in a loop also accounts for the enormous proliferation of coding agents out there now: Claude Code, Windsurf, Cursor, Cline, Copilot, Aider, Codex... and a ton of also-rans; as one HN poster put it the other day, it seems like everyone and their mother is writing one. The reason is that there is no secret sauce and 95% of the magic is in the LLM itself and how it's been fine-tuned to do tool calls. One of the lead developers of Claude Code candidly admits this in a recent interview.[0] Of course, a ton of work goes into making these tools work well, but ultimately they all have the same simple core.

[0] https://www.youtube.com/watch?v=zDmW5hJPsvQ

replies(12): >>43999361 #>>43999593 #>>44000028 #>>44000133 #>>44000238 #>>44000739 #>>44002234 #>>44003725 #>>44003808 #>>44004127 #>>44005134 #>>44010227 #

datpuz ◴[16 May 25 00:35 UTC] No.44000739[source]▶

>>43999072 #

Can't think of anything an LLM is good enough at to let them do on their own in a loop for more than a few iterations before I need to reign it back in.

replies(8): >>44000859 #>>44000866 #>>44001035 #>>44001519 #>>44002014 #>>44002521 #>>44003823 #>>44005529 #

CuriouslyC ◴[16 May 25 01:01 UTC] No.44000859[source]▶

>>44000739 #

The main problem with agents is that they aren't reflecting on their own performance and pausing their own execution to ask a human for help aggressively enough. Agents can run on for 20+ iterations in many cases successfully, but also will need hand holding after every iteration in some cases.

They're a lot like a human in that regard, but we haven't been building that reflection and self awareness into them so far, so it's like a junior that doesn't realize when they're over their depth and should get help.

replies(2): >>44001353 #>>44003161 #

ariwilson ◴[16 May 25 02:37 UTC] No.44001353[source]▶

>>44000859 #

Is there value in adding an overseer LLM that measures the progress between n steps and if it's too low stops and calls out to a human?

replies(4): >>44001399 #>>44002532 #>>44002700 #>>44003031 #

solumunus ◴[16 May 25 02:47 UTC] No.44001399[source]▶

>>44001353 #

And how does it effectively measure progress?

replies(1): >>44001501 #

NotMichaelBay ◴[16 May 25 03:13 UTC] No.44001501[source]▶

>>44001399 #

It can behave just like a senior role would - produce the set of steps for the junior to follow, and assess if the junior appears stuck at any particular step.

replies(2): >>44001524 #>>44002449 #

1. chongli ◴[16 May 25 03:18 UTC] No.44001524[source]▶

>>44001501 #

Producing the set of steps is the hard part. If you can do that, you don’t need a junior to follow it, you have a program to execute.

replies(2): >>44001622 #>>44001666 #

2. abletonlive ◴[16 May 25 03:36 UTC] No.44001622[source]▶

>>44001524 (TP) #

If this is true then we wouldn't have senior engineers that delegate. My suggestion is to think a couple more cycles before hitting that reply button. It'll save us all from reading obviously and confidently wrong statements.

replies(2): >>44002318 #>>44002517 #

3. adastra22 ◴[16 May 25 03:47 UTC] No.44001666[source]▶

>>44001524 (TP) #

It is a task that LLMs are quite good at.

replies(1): >>44002577 #

4. guappa ◴[16 May 25 06:19 UTC] No.44002318[source]▶

>>44001622 #

AI aren't real people… You do that with real people because you can't just rsync their knowledge.

Only on this website of completely reality detached individuals such an obvious comment would be needed.

replies(1): >>44008540 #

5. TeMPOraL ◴[16 May 25 06:59 UTC] No.44002517[source]▶

>>44001622 #

Senior engineers delegate in part because they're coaxed into a faux-management role (all of the responsibilities, none of the privileges). Coding is done by juniors; by the time anyone gains enough experience to finally begin to know what they're doing, they're relegated to "mentoring" and "training" new cohort of fresh juniors.

Explains a lot about software quality these days.

replies(1): >>44008503 #

6. Jensson ◴[16 May 25 07:13 UTC] No.44002577[source]▶

>>44001666 #

If the LLM actually could generate good steps that helped make forward progress then there would be no problem at all making agents, but agents are really bad so LLM can't be good at that.

If you feel those tips are good then you are just a bad judge of tips, there is a reason self help books sell so well even though they don't really help anyone, their goal is to write a lot of tips that sound good since they are kind of vague and general but doesn't really help the reader.

replies(1): >>44002726 #

7. adastra22 ◴[16 May 25 07:38 UTC] No.44002726{3}[source]▶

>>44002577 #

I use agentic LLMs every single day and get tremendous value. Asking the LLM to produce a set of bite-sized tasks with built-in corrective reminders is something that they're really good at. It gives good results.

I'm sorry if you're using it wrong.

replies(1): >>44002871 #

8. TeMPOraL ◴[16 May 25 08:01 UTC] No.44002871{4}[source]▶

>>44002726 #

Seconding. In the past months, when using Aider, I've been using the approach of discussing a piece of work (new project, larger change), and asking the model to prepare a plan of action. After possibly some little back and forth, I approve the plan and ask LLM to create or update a specification document for the project and a plan document which documents a sequence of changes broken down into bite-sized tasks - the latter is there to keep both me and the LLM on track. With that set, I can just keep repeatedly telling it to "continue implementation of the plan", and it does exactly that.

Eventually it'll do something wrong or I realize I wanted things differently, which necessitates some further conversation, but other than that, it's just "go on" until we run out of plan, then devising a new plan, rinse repeat.

replies(1): >>44007220 #

9. adastra22 ◴[16 May 25 16:19 UTC] No.44007220{5}[source]▶

>>44002871 #

This is pretty much what I do. It works very well.

10. abletonlive ◴[16 May 25 18:31 UTC] No.44008503{3}[source]▶

>>44002517 #

Or you know, they are leading big initiatives and cant do it all by themselves. Seniors can also delegate to other seniors. I am beyond senior with 11YOE and still code on a ton of my initiatives.

11. abletonlive ◴[16 May 25 18:35 UTC] No.44008540{3}[source]▶

>>44002318 #

So...you don't think you can give LLMs more knowledge ?? You're the one operating in detached reality. The reality is that a ton of engineers are finding LLMs useful, such as the author.

Maybe consider if you don't find it useful you're working on problems that it's not good at, or even more likely, you just suck at using the tools.

Anybody that finds value out of LLMs has a hard time understanding how one would conclude they are useless and you can't "give it instructions because that's that hard part" but it's actually really easy to understand. The folks that think this are just bad at it. We aren't living in some detached reality. The reality is that some people are just better than others

replies(1): >>44026733 #

12. guappa ◴[19 May 25 05:33 UTC] No.44026733{4}[source]▶

>>44008540 #

If you have a dumb AI and a good AI, why not just use the good AI?

↑