←back to thread

224 points azhenley | 1 comments | | HN request time: 0.207s | source
Show context
bitexploder ◴[] No.45075293[source]
If you look at how the most advanced commercial models are deployed they already have much of this, including isolation. This post is essentially sketching much of what I know already exists. Not in the literal OS sense, but in terms of all of the features suggested. It still isn’t enough. Agents need powerful access to things you care about to do their job. Granting them just enough permissions on the things you care about is much harder than containing the LLM, and that is already difficult. The right model for LLM security is an untrusted userspace, not an entire “OS”.
replies(1): >>45075969 #
1. wmorgan ◴[] No.45075969[source]
Untrusted userspace is exactly right. I’d expect these approaches to help on the margin but the authors oversell their point using words like “guarantee.”

Control tool access like OSes enforce file permissions: I understand it’s a metaphor, but also isn’t the track record of OSes here pretty bad?

Check whether the agent is allowed to use the booking tool: so a web browser? Isn’t a browser a pretty powerful general-purpose tool, which by the way could also expose the agent to, like, a jailbreak?

> As such, security researchers have to devise new mitigations to prevent AI models taking adversarial actions even with the virtual machine constraints.

An understated reminder that yes, we really ought to solve alignment.