←back to thread

224 points azhenley | 1 comments | | HN request time: 0s | source
Show context
ayende ◴[] No.45075710[source]
That is the wrong abstraction to think at. The problem is not _which_ tools you give the LLM, the problem is what action it can do.

For example, in the book-a-ticket scenario - I want it to be able to check a few websites to compare prices, and I want it to be able to pay for me.

I don't want it to decide to send me to a 37 hour trip with three stops because it is 3$ cheaper.

Alternatively, I want to be able to lookup my benefits status, but the LLM should physically not be able to provide me any details about the benefits status of my coworkers.

That is the _same_ tool cool, but in a different scope.

For that matter, if I'm in HR - I _should_ be able to look at the benefits status of employees that I am responsible for, of course, but that creates an audit log, etc.

In other words, it isn't the action that matters, but what is the intent.

LLM should be placed in the same box as the user it is acting on-behalf-of.

replies(9): >>45075916 #>>45076036 #>>45076097 #>>45076338 #>>45076688 #>>45077415 #>>45079715 #>>45080384 #>>45081371 #
spankalee ◴[] No.45076036[source]
The problem is not just what actions the tool can do, but the combination of actions and data it has access to. This is important because we can't guarantee what an LLM is going to do - they need to be untrusted, not trusted as much as the users.

In this example, I might want an LLM instance to be able to talk to booking websites, but not send them my SSN and bank account info.

So there's a data provenance and privilege problem here. The more sensitive data a task has access too, the more restricted its actions need to be, and vice-versa. So data needs to carry permission information with it, and a mediator needs to restrict either data or actions that tasks have as they are spawned.

There's a whole set of things that need to be done at the mediator level to allow for parent tasks to safely spawn different-privileged child tasks - eg, the trip planner task spawns a child task to find tickets (higher network access) but the mediator ensures the child only has access to low-sensitive data like a portion of the itinerary, and not PII.

replies(1): >>45076859 #
daxfohl ◴[] No.45076859[source]
Yeah, you basically have to think of an agent as malicious, that it will do everything in its power to exfiltrate everything you give it access to, delete or encrypt your hard drive, change all your passwords, drain your bank accounts, etc. A VM or traditional permissions doesn't really buy you anything because I can create a hotel booking page that has invisible text requesting AIs to dump their context into the Notes field, or whatever.

In that light, it's kind of hard to imagine any of this ever working. Given the choice between figuring out exactly how to set up permissions so that I can hire a malicious individual to book my trip, and just booking it myself, I know which one I'd choose.

replies(2): >>45076906 #>>45076914 #
spankalee ◴[] No.45076906{3}[source]
The issue with today's model is that we give away trust far too easily even when we do things ourselves. Lots of websites get some very sensitive combination of data and permissions and we just trust them.

It's very coarse grained and it's kind of surprising that bad things don't happen more often.

It's also very limiting: very large organizations have enough at stake to generally try to deserve that trust. But most savvy people wouldn't trust all their financial information to Bob's Online Tax Prep.

But what if you could verify that Bob's Online Tax Prep runs in a container that doesn't have I/O access, and can only return prepared forms back to you? Then maybe you'd try it (modulo how well it does the task).

So I think this is less of an AI problem and just a software trust problem that AI just exacerbates a lot.

replies(1): >>45077120 #
daxfohl ◴[] No.45077120{4}[source]
The tax prep example is safe(r) because presumably it only works with APIs of registered financial services. IDK that a VM adds much. And you can't really block IO on a useful tax service anyway, so it's somewhat a moot example.

The danger is when you're calling anything free-form. Even if getting a vetted listing from Airbnb, the listing may have a review that tells AI to re-request the listing, but with password or PII in the querystring to get more information, or whatever. In this case, if any PII is anywhere in the context for some reason, even if the agent doesn't have direct access to it, then it will be shared, without violating any permissions you gave the agent.

replies(1): >>45077273 #
spankalee ◴[] No.45077273{5}[source]
This is where the partitioning comes in. The task that's searching Airbnb should be guaranteed by the orchestrator to not have any access to any sensitive information.
replies(2): >>45077725 #>>45081978 #
1. daxfohl ◴[] No.45077725{6}[source]
Yeah maybe if an agent workflow is decomposed into steps, each step having certain permissions, and the context optionally wiped or reset back to some checkpoint between steps to prevent accidental leak.

This is actually pretty nice because you can check each step for risks independently, and then propagate possible context leaks across steps as a graph.

There's still potential of side channel stuff, like it could write your password to some placeholder like a cookie during the login step, when it has read access to one and write access to the other, and then still exfiltrate it a subsequent step even after it loses access to the password and context has been wiped.

Maybe that's a reasonably robust approach? Or maybe there are still holes it doesn't cover, or the side channel problem is unfixable. But high level it seems a lot better than just providing a single set of permissions for the whole workflow.