Living Dangerously with Claude

(simonwillison.net)

200 points FromTheArchives | 4 comments | 22 Oct 25 12:36 UTC | HN request time: 0.659s | source

Show context

matthewdgreen ◴[23 Oct 25 01:10 UTC] No.45677089[source]▶

So let me get this straight. You’re writing tens of thousands of lines of code that will presumably go into a public GitHub repository and/or be served from some location. Even if it only runs locally on your own machine, at some point you’ll presumably give that code network access. And that code is being developed (without much review) by an agent that, in our threat model, has been fully subverted by prompt injection?

Sandboxing the agent hardly seems like a sufficient defense here.

replies(3): >>45677537 #>>45684527 #>>45686450 #

tptacek ◴[23 Oct 25 17:34 UTC] No.45684527[source]▶

>>45677089 #

Where did "without much review" come from? I don't see that in the deck.

replies(2): >>45684731 #>>45688191 #

matthewdgreen ◴[23 Oct 25 22:29 UTC] No.45688191[source]▶

>>45684527 #

He wrote 14,000 lines of code in several days. How much review is going on there?

replies(1): >>45688711 #

1. simonw ◴[23 Oct 25 23:20 UTC] No.45688711[source]▶

>>45688191 #

Oh hang on, I think I've spotted a point of confusion here.

All three of the projects I described in this talk have effectively zero risk in terms of containing harmful unreviewed code.

DeepSeek-OCR on the Spark? I ran that one in a Docker container, saved some notes on the process and then literally threw away the container once it had finished.

The Pyodide in Node.js one I did actually review, because its code I execute on a machine that isn't disposable. The initial research ran in a disposable remote container though (Claude Code for web).

The Perl in WebAssembly one? That runs in a browser sandbox. There's effectively nothing bad that can happen there, that's why I like WebAssembly so much.

I am a whole lot more cautious in reviewing code that has real stakes attached to it.

replies(1): >>45691823 #

2. matthewdgreen ◴[24 Oct 25 07:19 UTC] No.45691823[source]▶

>>45688711 (TP) #

Understood. I read the article as “here is how to do YOLO coding safely”, and part of the “safely” idea was to sandbox the coding agent. I’m just pointing out that this, by itself, seems insufficient to prevent ugly exfiltration, it just makes exfiltration take an extra step. I’m also not sure that human code review scales to this much code, nor that it can contain that kind of exfiltration if the instructions specify some kind of obfuscation.

Obviously your recommendation to sandbox network access is one of several you make (the most effective one being “don’t let the agent ever touch sensitive data”), so I’m not saying the combined set of protections won’t work well. I’m also not saying that your projects specifically have any risk, just that they illustrate how much code you can end up with very quickly — making human review a fool’s errand.

ETA: if you do think human review can prevent secret exfiltration, I’d love to turn that into some kind of competition. Think of it as the obfuscated C contest with a scarier twist.

replies(2): >>45695412 #>>45695564 #

3. tptacek ◴[24 Oct 25 15:05 UTC] No.45695412[source]▶

>>45691823 #

Is it your claim that LLMs will produce subtly obfuscated secret exfiltrations?

4. thadt ◴[24 Oct 25 15:19 UTC] No.45695564[source]▶

>>45691823 #

It's an interesting risk tradeoff to think about. Is 14k lines of LLM generated code more likely to have an attack in it than 14k lines of transitive library dependencies I get when I add a package to my project?

In the library case, there is a network of people that could (and sometimes do) deliberately inject attacks into the supply chain. On the other hand, those libraries are used and looked at by other people - odds of detection are higher.

With LLM generated code, the initial developer is the only one looking at it. Getting an attack through in the first place seems harder, but detection probability is lower.

↑