The unreasonable effectiveness of an LLM agent loop with tool use

(sketch.dev)

447 points crawshaw | 2 comments | 15 May 25 19:33 UTC | HN request time: 0.434s | source

Show context

libraryofbabel ◴[15 May 25 20:36 UTC] No.43999072[source]▶

Strongly recommend this blog post too which is a much more detailed and persuasive version of the same point. The author actually goes and builds a coding agent from zero: https://ampcode.com/how-to-build-an-agent

It is indeed astonishing how well a loop with an LLM that can call tools works for all kinds of tasks now. Yes, sometimes they go off the rails, there is the problem of getting that last 10% of reliability, etc. etc., but if you're not at least a little bit amazed then I urge you go to and hack together something like this yourself, which will take you about 30 minutes. It's possible to have a sense of wonder about these things without giving up your healthy skepticism of whether AI is actually going to be effective for this or that use case.

This "unreasonable effectiveness" of putting the LLM in a loop also accounts for the enormous proliferation of coding agents out there now: Claude Code, Windsurf, Cursor, Cline, Copilot, Aider, Codex... and a ton of also-rans; as one HN poster put it the other day, it seems like everyone and their mother is writing one. The reason is that there is no secret sauce and 95% of the magic is in the LLM itself and how it's been fine-tuned to do tool calls. One of the lead developers of Claude Code candidly admits this in a recent interview.[0] Of course, a ton of work goes into making these tools work well, but ultimately they all have the same simple core.

[0] https://www.youtube.com/watch?v=zDmW5hJPsvQ

replies(12): >>43999361 #>>43999593 #>>44000028 #>>44000133 #>>44000238 #>>44000739 #>>44002234 #>>44003725 #>>44003808 #>>44004127 #>>44005134 #>>44010227 #

1. deadbabe ◴[16 May 25 10:33 UTC] No.44003725[source]▶

>>43999072 #

Generally when LLM’s are effective like this, it means a more efficient non-LLM based solution to the problem exists using the tools you have provided. The LLM helps you find the series of steps and synthesis of inputs and outputs to make it happen.

It is expensive and slow to have an LLM use tools all the time for solving the problem. The next step is to convert frequent patterns of tool calls into a single pure function, performing whatever transformation of inputs and outputs are needed along the way (an LLM can help you build these functions), and then perhaps train a simple cheap classifier to always send incoming data to this new function, bypassing LLMs all together.

In time, this will mean you will use LLMs less and less, limiting their use to new problems that are unable to be classified. This is basically like a “cache” for LLM based problem solving, where the keys are shapes of problems.

The idea of LLMs running 24/7 solving the same problems in the same way over and over again should become a distant memory, though not one that an AI company with vested interest in selling as many API calls as possible will want people to envision. Ideally LLMs are only needed to be employed once or a few times per novel problem before being replaced with cheaper code.

replies(1): >>44058190 #

2. toobulkeh ◴[22 May 25 02:24 UTC] No.44058190[source]▶

>>44003725 (TP) #

I’ve been tinkering with this, but haven’t found a pattern or library of someone solving this.

Have you?

↑