←back to thread

780 points rexpository | 1 comments | | HN request time: 0s | source
Show context
tptacek ◴[] No.44503091[source]
This is just XSS mapped to LLMs. The problem, as is so often the case with admin apps (here "Cursor and the Supabase MCP" is an ad hoc admin app), is that they get a raw feed of untrusted user-generated content (they're internal scaffolding, after all).

In the classic admin app XSS, you file a support ticket with HTML and injected Javascript attributes. None of it renders in the customer-facing views, but the admin views are slapped together. An admin views the ticket (or even just a listing of all tickets) and now their session is owned up.

Here, just replace HTML with LLM instructions, the admin app with Cursor, the browser session with "access to the Supabase MCP".

replies(4): >>44503182 #>>44503194 #>>44503269 #>>44503304 #
ollien ◴[] No.44503304[source]
You're technically right, but by reducing the problem to being "just" another form of a classic internal XSS, missing the forest for the trees.

An XSS mitigation takes a blob of input and converts it into something that we can say with certainty will never execute. With prompt injection mitigation, there is no set of deterministic rules we can apply to a blob of input to make it "not LLM instructions". To this end, it is fundamentally unsafe to feed _any_ untrusted input into an LLM that has access to privileged information.

replies(2): >>44503346 #>>44503483 #
tptacek ◴[] No.44503346[source]
Seems pretty simple: the MCP calls are like an eval(), and untrusted input can't ever hit it. Your success screening and filtering LLM'd eval() inputs will be about as successful as your attempts to sanitize user-generated content before passing them to an eval().

eval() --- still pretty useful!

replies(2): >>44503430 #>>44503440 #
losvedir ◴[] No.44503440[source]
The problem is, as you say, eval() is still useful! And having LLMs digest or otherwise operate on untrusted input is one of its stronger use cases.

I know you're pretty pro-LLM, and have talked about fly.io writing their own agents. Do you have a different solution to the "trifecta" Simon talks about here? Do you just take the stance that agents shouldn't work with untrusted input?

Yes, it feels like this is "just" XSS, which is "just" a category of injection, but it's not obvious to me the way to solve it, the way it is with the others.

replies(2): >>44503477 #>>44503511 #
1. refulgentis ◴[] No.44503511{3}[source]
> but it's not obvious to me the way to solve it

It reduces down to untrusted input with a confused deputy.

Thus, I'd play with the argument it is obvious.

Those are both well-trodden and well-understood scenarios, before LLMs were a speck of a gleam in a researcher's eye.

I believe that leaves us with exactly 3 concrete solutions:

#1) Users don't provide both private read and public write tools in the same call - IIRC that's simonw's prescription & also why he points out these scenarios.

#2) We have a non-confusable deputy, i.e. omniscient. (I don't think this achievable, ever, either with humans or silicon)

#3) We use two deputies, one of which only has tools that are private read, another that are public write (this is the approach behind e.g. Google's CAMEL, but I'm oversimplifying. IIRC Camel is more the general observation that N-deputies is the only way out of this that doesn't involve just saying PEBKAC, i.e. #1)