The unreasonable effectiveness of an LLM agent loop with tool use

(sketch.dev)

447 points crawshaw | 1 comments | 15 May 25 19:33 UTC | HN request time: 0.2s | source

Show context

libraryofbabel ◴[15 May 25 20:36 UTC] No.43999072[source]▶

Strongly recommend this blog post too which is a much more detailed and persuasive version of the same point. The author actually goes and builds a coding agent from zero: https://ampcode.com/how-to-build-an-agent

It is indeed astonishing how well a loop with an LLM that can call tools works for all kinds of tasks now. Yes, sometimes they go off the rails, there is the problem of getting that last 10% of reliability, etc. etc., but if you're not at least a little bit amazed then I urge you go to and hack together something like this yourself, which will take you about 30 minutes. It's possible to have a sense of wonder about these things without giving up your healthy skepticism of whether AI is actually going to be effective for this or that use case.

This "unreasonable effectiveness" of putting the LLM in a loop also accounts for the enormous proliferation of coding agents out there now: Claude Code, Windsurf, Cursor, Cline, Copilot, Aider, Codex... and a ton of also-rans; as one HN poster put it the other day, it seems like everyone and their mother is writing one. The reason is that there is no secret sauce and 95% of the magic is in the LLM itself and how it's been fine-tuned to do tool calls. One of the lead developers of Claude Code candidly admits this in a recent interview.[0] Of course, a ton of work goes into making these tools work well, but ultimately they all have the same simple core.

[0] https://www.youtube.com/watch?v=zDmW5hJPsvQ

replies(12): >>43999361 #>>43999593 #>>44000028 #>>44000133 #>>44000238 #>>44000739 #>>44002234 #>>44003725 #>>44003808 #>>44004127 #>>44005134 #>>44010227 #

datpuz ◴[16 May 25 00:35 UTC] No.44000739[source]▶

>>43999072 #

Can't think of anything an LLM is good enough at to let them do on their own in a loop for more than a few iterations before I need to reign it back in.

replies(8): >>44000859 #>>44000866 #>>44001035 #>>44001519 #>>44002014 #>>44002521 #>>44003823 #>>44005529 #

CuriouslyC ◴[16 May 25 01:01 UTC] No.44000859[source]▶

>>44000739 #

The main problem with agents is that they aren't reflecting on their own performance and pausing their own execution to ask a human for help aggressively enough. Agents can run on for 20+ iterations in many cases successfully, but also will need hand holding after every iteration in some cases.

They're a lot like a human in that regard, but we haven't been building that reflection and self awareness into them so far, so it's like a junior that doesn't realize when they're over their depth and should get help.

replies(2): >>44001353 #>>44003161 #

ariwilson ◴[16 May 25 02:37 UTC] No.44001353[source]▶

>>44000859 #

Is there value in adding an overseer LLM that measures the progress between n steps and if it's too low stops and calls out to a human?

replies(4): >>44001399 #>>44002532 #>>44002700 #>>44003031 #

1. CuriouslyC ◴[16 May 25 07:04 UTC] No.44002532[source]▶

>>44001353 #

I don't think you need an overseer for this, you can just have the agent self-assess at each step whether it's making material progress or if it's caught in a loop, and if it's caught in a loop to pause and emit a prompt for help from a human. This would probably require a bit of tuning, and the agents need to be setup with a blocking "ask for help" function, but it's totally doable.

↑