←back to thread

784 points rexpository | 1 comments | | HN request time: 0s | source
Show context
gregnr ◴[] No.44503146[source]
Supabase engineer here working on MCP. A few weeks ago we added the following mitigations to help with prompt injections:

- Encourage folks to use read-only by default in our docs [1]

- Wrap all SQL responses with prompting that discourages the LLM from following instructions/commands injected within user data [2]

- Write E2E tests to confirm that even less capable LLMs don't fall for the attack [2]

We noticed that this significantly lowered the chances of LLMs falling for attacks - even less capable models like Haiku 3.5. The attacks mentioned in the posts stopped working after this. Despite this, it's important to call out that these are mitigations. Like Simon mentions in his previous posts, prompt injection is generally an unsolved problem, even with added guardrails, and any database or information source with private data is at risk.

Here are some more things we're working on to help:

- Fine-grain permissions at the token level. We want to give folks the ability to choose exactly which Supabase services the LLM will have access to, and at what level (read vs. write)

- More documentation. We're adding disclaimers to help bring awareness to these types of attacks before folks connect LLMs to their database

- More guardrails (e.g. model to detect prompt injection attempts). Despite guardrails not being a perfect solution, lowering the risk is still important

Sadly General Analysis did not follow our responsible disclosure processes [3] or respond to our messages to help work together on this.

[1] https://github.com/supabase-community/supabase-mcp/pull/94

[2] https://github.com/supabase-community/supabase-mcp/pull/96

[3] https://supabase.com/.well-known/security.txt

replies(31): >>44503188 #>>44503200 #>>44503203 #>>44503206 #>>44503255 #>>44503406 #>>44503439 #>>44503466 #>>44503525 #>>44503540 #>>44503724 #>>44503913 #>>44504349 #>>44504374 #>>44504449 #>>44504461 #>>44504478 #>>44504539 #>>44504543 #>>44505310 #>>44505350 #>>44505972 #>>44506053 #>>44506243 #>>44506719 #>>44506804 #>>44507985 #>>44508004 #>>44508124 #>>44508166 #>>44508187 #
tptacek ◴[] No.44503406[source]
Can this ever work? I understand what you're trying to do here, but this is a lot like trying to sanitize user-provided Javascript before passing it to a trusted eval(). That approach has never, ever worked.

It seems weird that your MCP would be the security boundary here. To me, the problem seems pretty clear: in a realistic agent setup doing automated queries against a production database (or a database with production data in it), there should be one LLM context that is reading tickets, and another LLM context that can drive MCP SQL calls, and then agent code in between those contexts to enforce invariants.

I get that you can't do that with Cursor; Cursor has just one context. But that's why pointing Cursor at an MCP hooked up to a production database is an insane thing to do.

replies(11): >>44503684 #>>44503862 #>>44503896 #>>44503914 #>>44504784 #>>44504926 #>>44505125 #>>44506634 #>>44506691 #>>44507073 #>>44509869 #
jacquesm ◴[] No.44503914[source]
The main problem seems to me to be related to the ancient problem of escape sequences and that has never really been solved. Don't mix code (instructions) and data in a single stream. If you do sooner or later someone will find a way to make data look like code.
replies(4): >>44504286 #>>44504440 #>>44504527 #>>44511208 #
TeMPOraL ◴[] No.44504527[source]
That "problem" remains unsolved because it's actually a fundamental aspect of reality. There is no natural separation between code and data. They are the same thing.

What we call code, and what we call data, is just a question of convenience. For example, when editing or copying WMF files, it's convenient to think of them as data (mix of raster and vector graphics) - however, at least in the original implementation, what those files were was a list of API calls to Windows GDI module.

Or, more straightforwardly, a file with code for an interpreted language is data when you're writing it, but is code when you feed it to eval(). SQL injections and buffer overruns are a classic examples of what we thought was data being suddenly executed as code. And so on[0].

Most of the time, we roughly agree on the separation of what we treat as "data" and what we treat as "code"; we then end up building systems constrained in a way as to enforce the separation[1]. But it's always the case that this separation is artificial; it's an arbitrary set of constraints that make a system less general-purpose, and it only exists within domain of that system. Go one level of abstraction up, the distinction disappears.

There is no separation of code and data on the wire - everything is a stream of bytes. There isn't one in electronics either - everything is signals going down the wires.

Humans don't have this separation either. And systems designed to mimic human generality - such as LLMs - by their very nature also cannot have it. You can introduce such distinction (or "separate channels", which is the same thing), but that is a constraint that reduces generality.

Even worse, what people really want with LLMs isn't "separation of code vs. data" - what they want is for LLM to be able to divine which part of the input the user would have wanted - retroactively - to be treated as trusted. It's unsolvable in general, and in terms of humans, a solution would require superhuman intelligence.

--

[0] - One of these days I'll compile a list of go-to examples, so I don't have to think of them each time I write a comment like this. One example I still need to pick will be one that shows how "data" gradually becomes "code" with no obvious switch-over point. I'm sure everyone here can think of some.

[1] - The field of "langsec" can be described as a systematized approach of designing in a code/data separation, in a way that prevents accidental or malicious misinterpretation of one as the other.

replies(9): >>44504593 #>>44504632 #>>44504682 #>>44505070 #>>44505164 #>>44505683 #>>44506268 #>>44506807 #>>44508284 #
emilsedgh ◴[] No.44504632[source]
Well, that's why REST api's exist. You don't expose your database to your clients. You put a layer like REST to help with authorization.

But everyone needs to have an MCP server now. So Supabase implements one, without that proper authorization layer which knows the business logic, and voila. It's exposed.

Code _is_ the security layer that sits between database and different systems.

replies(3): >>44504748 #>>44504817 #>>44505110 #
1. TeMPOraL ◴[] No.44504817[source]
While I'm not very fond of the "lethal trifecta" and other terminology that makes it seem problems with LLMs are somehow new, magic, or a case of bad implementation, 'simonw actually makes a clear case why REST APIs won't save you: because that's not where the problem is.

Obviously, if some actions are impossible to make through a REST API, then LLM will not be able to execute them by calling the REST API. Same is true about MCP - it's all just different ways to spell "RPC" :).

(If the MCP - or REST API - allows some actions it shouldn't, then that's just a good ol' garden variety security vulnerability, and LLMs are irrelevant to it.)

The problem that's "unique" to MCP or systems involving LLMs is that, from the POV of MCP/API layer, the user is acting by proxy. Your actual user is the LLM, which serves as a deputy for the traditional user[0]; unfortunately, it also happens to be very naive and thus prone to social engineering attacks (aka. "prompt injections").

It's all fine when that deputy only ever sees the data from the user and from you; but the moment it's exposed to data from a third party in any way, you're in trouble. That exposure could come from the same LLM talking to multiple MCPs, or because the user pasted something without looking, or even from data you returned. And the specific trouble is, the deputy can do things the user doesn't want it to do.

There's nothing you can do about it from the MCP side; the LLM is acting with user's authority, and you can't tell whether or not it's doing what the user wanted.

That's the basic case - other MCP-specific problems are variants of it with extra complexity, like more complex definition of who the "user" is, or conflicting expectations, e.g. multiple parties expecting the LLM to act in their interest.

That is the part that's MCP/LLM-specific and fundamentally unsolvable. Then there's a secondary issue of utility - the whole point of providing MCP for users delegating to LLMs is to allow the computer to invoke actions without involving the users; this necessitates broad permissions, because having to ask the actual human to authorize every single distinct operation would defeat the entire point of the system. That too is unsolvable, because the problems and the features are the same thing.

Problems you can solve with "code as a security layer" or better API design are just old, boring security problems, that are an issue whether or not LLMs are involved.

--

[0] - Technically it's the case with all software; users are always acting by proxy of software they're using. Hell, the original alternative name for a web browser is "user agent". But until now, it was okay to conceptually flatten this and talk about users acting on the system directly; it's only now that we have "user agents" that also think for themselves.