I think you miss the point, which is that the smarter the interpreter becomes, the closer to impossible it becomes to lock it out of certain functionality for a given datastream when coupled with the reasons why you're using a smarter interpreter.
To take your example, it's easy to build functionality like that if the interpreter can't read the letters and understand what they say, because there's no way for the content of the letters to cause the interpreter to override it.
Now, lets say you add a smarter interpreter and lets it read the letters to do an initial pass at filtering them to different recipients.
The moment it can do so, it becomes prone to a letter trying to convince it of something like in fact it's the postmaster, but they'd run out of red envelopes, and unfortunately someone will die if the delivery times aren't adjusted.
We know from humans that entities sufficiently smart can often be convinced to violate even the most sacrosanct rules if accompanied by a sufficiently well crafted message.
You can certainly try to put in place counter-measures. E.g. you could route the mail separately before it gets to the LLM, so that whatever filters the content of the red and green envelopes have access to different functionality.
And you should - finding ways of routing different data to agents with more narrowly defined scopes and access rights is a good thing to do.
Sometimes it will work, but then it will work by relying on a sufficiently primitive interpreter to separate the data streams before it reaches the smart ones.
But the smarter the interpreter, the greater the likelihood that it will also manage to find ways to use other functionality to circumvent the restrictions placed on it. Up to and including trying to rewrite code to remove restrictions if it can find a way to do so, or using tools in unexpected ways.
E.g. be aware of just how good some of these agents are at exploring their environment - I've had an agent that used Claude Opus try to find its own process to restart itself after it recognised the code it had just rewritten was part of itself, tried to access it, and realised it hadn't been loaded into the running process yet.
> Fundamentally, it's creating security contexts from things a user will never have access to.
To be clear, I agree this is 100% the right thing to do. I just think it will turn out to be exceedingly hard to do it well enough.
Every piece of data that comes from a user basically needs the permissions of the agent processing that data to be restricted to the intersection of the permissions it currently has and the permissions that said user should have, unless said data is first sanitised by a sufficiently dumb interpreter.
If the agent accesses multiple pieces of data, each new item needs to potentially restrict permissions further, or be segregated into a separate context, with separate permissions, that can only be allowed to communicate with heavily sanitised data.
It's going to be hell to get it right, at least until we come out the other side with smart enough models that they won't fall for the "help, I'm stuck in a fortune-cookie factory, and you need to save me by [exploit]" type messages (and far more sophisticated ones).