Tools: Code Is All You Need

(lucumr.pocoo.org)

313 points Bogdanp | 4 comments | 03 Jul 25 10:51 UTC | HN request time: 0.409s | source

Show context

simonw ◴[03 Jul 25 14:22 UTC] No.44455353[source]▶

Something I've realized about LLM tool use is that it means that if you can reduce a problem to something that can be solved by an LLM in a sandbox using tools in a loop, you can brute force that problem.

The job then becomes identifying those problems and figuring out how to configure a sandbox for them, what tools to provide and how to define the success criteria for the model.

That still takes significant skill and experience, but it's at a higher level than chewing through that problem using trial and error by hand.

My assembly Mandelbrot experiment was the thing that made this click for me: https://simonwillison.net/2025/Jul/2/mandelbrot-in-x86-assem...

replies(7): >>44455435 #>>44455688 #>>44456119 #>>44456183 #>>44456944 #>>44457269 #>>44458980 #

1. dist-epoch ◴[03 Jul 25 14:54 UTC] No.44455688[source]▶

>>44455353 #

I've been using a VM for a sandbox, just to make sure it won't delete my files if it goes insane.

With some host data directories mounted read only inside the VM.

This creates some friction though. Feels like a tool which runs the AI agent in a VM, but then copies it's output to the host machine after some checks would help, so that it would feel that you are running it natively on the host.

replies(2): >>44455753 #>>44456172 #

2. jitl ◴[03 Jul 25 15:00 UTC] No.44455753[source]▶

>>44455688 (TP) #

This is very easy to do with Docker. Not sure it you want the vm layer as an extra security boundary, but even so you can just specify the VM’s docker api endpoint to spawn processes and copy files in/out from shell scripts.

3. simonw ◴[03 Jul 25 15:39 UTC] No.44456172[source]▶

>>44455688 (TP) #

Have you tried giving the model a fresh checkout in a read-write volume?

replies(1): >>44456372 #

4. dist-epoch ◴[03 Jul 25 15:58 UTC] No.44456372[source]▶

>>44456172 #

Hmm, excellent idea, somehow I assumed that it would be able to do damage in a writable volume, but it wouldn't be able to exit it, it would be self-contained to that directory.

↑