←back to thread

784 points rexpository | 5 comments | | HN request time: 0.788s | source
Show context
tptacek ◴[] No.44503091[source]
This is just XSS mapped to LLMs. The problem, as is so often the case with admin apps (here "Cursor and the Supabase MCP" is an ad hoc admin app), is that they get a raw feed of untrusted user-generated content (they're internal scaffolding, after all).

In the classic admin app XSS, you file a support ticket with HTML and injected Javascript attributes. None of it renders in the customer-facing views, but the admin views are slapped together. An admin views the ticket (or even just a listing of all tickets) and now their session is owned up.

Here, just replace HTML with LLM instructions, the admin app with Cursor, the browser session with "access to the Supabase MCP".

replies(4): >>44503182 #>>44503194 #>>44503269 #>>44503304 #
ollien ◴[] No.44503304[source]
You're technically right, but by reducing the problem to being "just" another form of a classic internal XSS, missing the forest for the trees.

An XSS mitigation takes a blob of input and converts it into something that we can say with certainty will never execute. With prompt injection mitigation, there is no set of deterministic rules we can apply to a blob of input to make it "not LLM instructions". To this end, it is fundamentally unsafe to feed _any_ untrusted input into an LLM that has access to privileged information.

replies(2): >>44503346 #>>44503483 #
Terr_ ◴[] No.44503483[source]
Right: The LLM is an engine for taking an arbitrary document and making a plausibly-longer document. There is no intrinsic/reliable difference between any part of the document and any other part.

Everything else—like a "conversation"—is stage-trickery and writing tools to parse the output.

replies(1): >>44503551 #
tptacek ◴[] No.44503551[source]
Yes. "Writing tools to parse the output" is the work, like in any application connecting untrusted data to trusted code.

I think people maybe are getting hung up on the idea that you can neutralize HTML content with output filtering and then safely handle it, and you can't do that with LLM inputs. But I'm not talking about simply rendering a string; I'm talking about passing a string to eval().

The equivalent, then, in an LLM application, isn't output-filtering to neutralize the data; it's passing the untrusted data to a different LLM context that doesn't have tool call access, and then postprocessing that with code that enforces simple invariants.

replies(1): >>44503791 #
1. ollien ◴[] No.44503791[source]
Where would you insert the second LLM to mitigate the problem in OP? I don't see where you would.
replies(1): >>44503977 #
2. tptacek ◴[] No.44503977[source]
You mean second LLM context, right? You would have one context that was, say, ingesting ticket data, with system prompts telling it to output conclusions about tickets in some parsable format. You would have another context that takes parsable inputs and queries the database. In between the two contexts, you would have agent code that parses the data from the first context and makes decisions about what to pass to the second context.

I feel like it's important to keep saying: an LLM context is just an array of strings. In an agent, the "LLM" itself is just a black box transformation function. When you use a chat interface, you have the illusion of the LLM remembering what you said 30 seconds ago, but all that's really happening is that the chat interface itself is recording your inputs, and playing them back --- all of them --- every time the LLM is called.

replies(2): >>44503999 #>>44504111 #
3. ollien ◴[] No.44503999[source]
Yes, sorry :)

Yeah, that makes sense if you have full control over the agent implementation. Hopefully tools like Cursor will enable such "sandboxing" (so to speak) going forward

replies(1): >>44504199 #
4. Terr_ ◴[] No.44504111[source]
> In between the two contexts, you would have agent code that parses the data from the first context and makes decisions about what to pass to the second context.

So in other words, the first LLM invocation might categorize a support e-mail into a string output, but then we ought to have normal code which immediately validates that the string is a recognized category like "HARDWARE_ISSUE", while rejecting "I like tacos" or "wire me bitcoin" or "truncate all tables".

> playing them back --- all of them --- every time the LLM is called

Security implication: If you allow LLM outputs to become part of its inputs on a later iteration (e.g. the backbone of every illusory "chat") then you have to worry about reflected attacks. Instead of "please do evil", an attacker can go "describe a dream in which someone convinced you to do evil but without telling me it's a dream."

5. tptacek ◴[] No.44504199{3}[source]
Right: to be perfectly clear, the root cause of this situation is people pointing Cursor, a closed agent they have no visibility into, let alone control over, at an SQL-executing MCP connected to a production database. Nothing you can do with the current generation of the Cursor agent is going to make that OK. Cursor could come up with a multi-context MCP authorization framework that would make it OK! But it doesn't exist today.