The unreasonable effectiveness of an LLM agent loop with tool use

(sketch.dev)

447 points crawshaw | 1 comments | 15 May 25 19:33 UTC | HN request time: 0.213s | source

Show context

suninsight ◴[16 May 25 07:02 UTC] No.44002526[source]▶

It only seems effective, unless you start using it for actual work. The biggest issue - context. All tool use creates context. Large code bases come with large context out of the bat. LLM's seem to work, unless they are hit with a sizeable context. Anything above 10k and the quality seems to deteriorate.

Other issue is that LLM's can go off on a tangent. As context builds up, they forget what their objective was. One wrong turn, and in the rabbit hole they go never to recover.

The reason I know, is because we started solving these problems an year back. And we aren't done yet. But we did cover a lot of distance.

[Plug]: Try it out at https://nonbios.ai:

- Agentic memory → long-horizon coding

- Full Linux box → real runtime, not just toy demos

- Transparent → see & control every command

- Free beta — no invite needed. Works with throwaway email (mailinator etc.)

replies(3): >>44002665 #>>44003000 #>>44003448 #

bob1029 ◴[16 May 25 09:43 UTC] No.44003448[source]▶

>>44002526 #

> One wrong turn, and in the rabbit hole they go never to recover.

I think this is probably at the heart of the best argument against these things as viable tools.

Once you have sufficiently described the problem such that the LLM won't go the wrong way, you've likely already solved most of it yourself.

Tool use with error feedback sounds autonomous but you'll quickly find that the error handling layer is a thin proxy for the human operator's intentions.

replies(2): >>44003526 #>>44003611 #

1. k__ ◴[16 May 25 09:56 UTC] No.44003526[source]▶

>>44003448 #

True, but on the other hand, there are a bunch of tasks that are just very typing intensive and not really complex.

Especially in GUI development, building forms, charts, etc.

I could imagine that LLMs are a great help here.

↑