The unreasonable effectiveness of an LLM agent loop with tool use

(sketch.dev)

447 points crawshaw | 2 comments | 15 May 25 19:33 UTC | HN request time: 0.65s | source

Show context

suninsight ◴[16 May 25 07:02 UTC] No.44002526[source]▶

It only seems effective, unless you start using it for actual work. The biggest issue - context. All tool use creates context. Large code bases come with large context out of the bat. LLM's seem to work, unless they are hit with a sizeable context. Anything above 10k and the quality seems to deteriorate.

Other issue is that LLM's can go off on a tangent. As context builds up, they forget what their objective was. One wrong turn, and in the rabbit hole they go never to recover.

The reason I know, is because we started solving these problems an year back. And we aren't done yet. But we did cover a lot of distance.

[Plug]: Try it out at https://nonbios.ai:

- Agentic memory → long-horizon coding

- Full Linux box → real runtime, not just toy demos

- Transparent → see & control every command

- Free beta — no invite needed. Works with throwaway email (mailinator etc.)

replies(3): >>44002665 #>>44003000 #>>44003448 #

1. moffkalast ◴[16 May 25 08:25 UTC] No.44003000[source]▶

>>44002526 #

Some of the thinking models might recover... with an extra 4k tokens used up in <thinking>. And even if they were stable at long contexts, the speed drops massively. You just can't win with this architecture lol.

replies(1): >>44003054 #

2. suninsight ◴[16 May 25 08:34 UTC] No.44003054[source]▶

>>44003000 (TP) #

That is very accurate with what we have found. <thinking> models do a lot better, but with huge speed drops. For now, we have chosen accuracy over speed. But speed drop is like 3-4x - so we might move to an architecture where we 'think' only sporadically.

Everything happening in the LLM space is so close to how humans think naturally.

↑