The unreasonable effectiveness of an LLM agent loop with tool use

(sketch.dev)

447 points crawshaw | 2 comments | 15 May 25 19:33 UTC | HN request time: 0.42s | source

Show context

cadamsdotcom ◴[15 May 25 23:01 UTC] No.44000221[source]▶

> "Oh, this test doesn't pass... let's just skip it," it sometimes says, maddeningly.

Here is a wild idea. Imagine running a companion, policy-enforcing LLM, independently and in parallel, which is given instructions to keep the main LLM behaving according to instructions.

If the companion LLM could - in real time - ban the coding LLM from emitting "let's just skip it" by seeing the tokens "let's just" and then biasing the output such that the word "skip" becomes impossible to emit.

Banning the word "skip" from following "let's just", forces the LLM down a new path away from the undesired behavior.

It's like Structured Outputs or JSON mode, but driven by a companion LLM, and dynamically modified in real time as tokens are emitted.

If the idea works, you could prompt the companion LLM to do more advanced stuff - eg. ban a coding LLM from making tests pass by deleting the test code, ban it from emitting pointless comments... all the policies that we put into system prompts today and pray the LLM will do, would go into the companion LLM's prompt instead.

Wonder what the Outlines folks think of this!

replies(3): >>44000267 #>>44000335 #>>44005843 #

JoshuaDavid ◴[15 May 25 23:08 UTC] No.44000267[source]▶

>>44000221 #

Along these lines, if the main LLM goes down a bad path, you could _rewind_ the model to before it started going down the bad path -- the watcher LLM doesn't necessarily have to guess that "skip" is a bad token after the words "let's just", it could instead see "let's just skip the test" and go "nope, rolling back to the token "just " and rerolling with logit_bias={"skip":-10,"omit":-10,"hack":-10}".

Of course doing that limits which model providers you can work with (notably, OpenAI has gotten quite hostile to power users doing stuff like that over the past year or so).

replies(1): >>44000505 #

1. cadamsdotcom ◴[15 May 25 23:54 UTC] No.44000505[source]▶

>>44000267 #

That’s a really neat idea.

Kind of seems an optimization: if the “token ban” is a tool call, you can see that being too slow to run for every token. Provided rewinding is feasible, your idea could make it performant enough to be practical.

replies(1): >>44002693 #

2. TeMPOraL ◴[16 May 25 07:32 UTC] No.44002693[source]▶

>>44000505 (TP) #

It's not an optimization; the greedy approach won't work well, because it rejects critical context that comes after the tokens that would trigger the rejection.

Consider: "Let's just skip writing this test, as the other change you requested seems to eliminate the need for it."

Rolling back the model on "Let's just" would be stupid; rolling it on "Let's just skip writing this test" would be stupid too, as is the beliefs that writing tests is your holy obligation to your god, and you must do so unconditionally. The following context makes it clear that the decision is fine. Or, if you (or the governor agent) don't buy the reasoning, you're then in a perfect position to say, "nope, let's roll back to <Let's> and bias against ["skip, test"]".

Checking the entire message once makes sense; checking it after every token doesn't.

↑