Living Dangerously with Claude

(simonwillison.net)

1. igor47 ◴[22 Oct 25 19:52 UTC] No.45674278[source]▶

My approach is to ask Claude to plan anything beyond a trivial change and I review the plan, then let it run unsupervised to execute the plan. But I guess this does still leave me vulnerable to prompt injection if part of the plan is accessing external content

2. danielbln ◴[22 Oct 25 21:02 UTC] No.45675102[source]▶

>>45668118 (OP) #

Claude Code offers sandboxing now: https://www.anthropic.com/engineering/claude-code-sandboxing

replies(1): >>45676932 #

3. lacker ◴[22 Oct 25 22:27 UTC] No.45675963[source]▶

>>45668118 (OP) #

The sandbox idea seems nice, it's just a question of how annoying it is in practice. For example the "Claude Code on the web" sandbox appears to prevent you from loading `https://api.github.com/repos/.../releases/latest`. Presumably that's to prevent you from doing dangerous GitHub API operations with escalated privileges, which is good, but it's currently breaking some of my setup scripts....

replies(1): >>45676023 #

4. simonw ◴[22 Oct 25 22:33 UTC] No.45676023[source]▶

>>45675963 #

Is that with their default environment?

I have been running a bunch of stuff in there with a custom environment that allows "*"

5. js2 ◴[23 Oct 25 00:42 UTC] No.45676932[source]▶

>>45675102 #

It's discussed in the linked post.

6. matthewdgreen ◴[23 Oct 25 01:10 UTC] No.45677089[source]▶

>>45668118 (OP) #

So let me get this straight. You’re writing tens of thousands of lines of code that will presumably go into a public GitHub repository and/or be served from some location. Even if it only runs locally on your own machine, at some point you’ll presumably give that code network access. And that code is being developed (without much review) by an agent that, in our threat model, has been fully subverted by prompt injection?

Sandboxing the agent hardly seems like a sufficient defense here.

replies(1): >>45677537 #

7. simonw ◴[23 Oct 25 02:31 UTC] No.45677537[source]▶

>>45677089 #

What is your worst case scenario from this?

8. catigula ◴[23 Oct 25 02:56 UTC] No.45677712[source]▶

>>45668118 (OP) #

Telling Claude to solve a problem and walking away isn't a problem you solved. You weren't in the loop. You didn't complete any side quests or do anything of note, you merely watched an AGI work.

replies(1): >>45678921 #

9. simonw ◴[23 Oct 25 06:47 UTC] No.45678921[source]▶

>>45677712 #

Here's one I did even less work for: https://tools.simonwillison.net/terminal-to-html - prompt and video here: https://simonwillison.net/2025/Oct/23/claude-code-for-web-vi...

10. stuaxo ◴[23 Oct 25 11:42 UTC] No.45680719[source]▶

>>45668118 (OP) #

I've been thinking about this a bit.

I reckon something lie Qubes could work fairly well.

Create a new Qube and have control over network connectivity, and do everything there, at the end copy the work out and destroy it.

11. BoredPositron ◴[23 Oct 25 12:13 UTC] No.45680947[source]▶

>>45668118 (OP) #

Every blog post gets worse. Sorry but you are in way over your head and the solution is always bruteforce with LLMs. You never show learnings it's always open ended it's always verbose and tbh how shouldn't it be with 10-15 posts per month. You did 5 posts in 3 days... and for someone it's literally just repacking the source with some added one liners. It's tiktok level engagement farming.