←back to thread

784 points rexpository | 1 comments | | HN request time: 0s | source
Show context
gregnr ◴[] No.44503146[source]
Supabase engineer here working on MCP. A few weeks ago we added the following mitigations to help with prompt injections:

- Encourage folks to use read-only by default in our docs [1]

- Wrap all SQL responses with prompting that discourages the LLM from following instructions/commands injected within user data [2]

- Write E2E tests to confirm that even less capable LLMs don't fall for the attack [2]

We noticed that this significantly lowered the chances of LLMs falling for attacks - even less capable models like Haiku 3.5. The attacks mentioned in the posts stopped working after this. Despite this, it's important to call out that these are mitigations. Like Simon mentions in his previous posts, prompt injection is generally an unsolved problem, even with added guardrails, and any database or information source with private data is at risk.

Here are some more things we're working on to help:

- Fine-grain permissions at the token level. We want to give folks the ability to choose exactly which Supabase services the LLM will have access to, and at what level (read vs. write)

- More documentation. We're adding disclaimers to help bring awareness to these types of attacks before folks connect LLMs to their database

- More guardrails (e.g. model to detect prompt injection attempts). Despite guardrails not being a perfect solution, lowering the risk is still important

Sadly General Analysis did not follow our responsible disclosure processes [3] or respond to our messages to help work together on this.

[1] https://github.com/supabase-community/supabase-mcp/pull/94

[2] https://github.com/supabase-community/supabase-mcp/pull/96

[3] https://supabase.com/.well-known/security.txt

replies(31): >>44503188 #>>44503200 #>>44503203 #>>44503206 #>>44503255 #>>44503406 #>>44503439 #>>44503466 #>>44503525 #>>44503540 #>>44503724 #>>44503913 #>>44504349 #>>44504374 #>>44504449 #>>44504461 #>>44504478 #>>44504539 #>>44504543 #>>44505310 #>>44505350 #>>44505972 #>>44506053 #>>44506243 #>>44506719 #>>44506804 #>>44507985 #>>44508004 #>>44508124 #>>44508166 #>>44508187 #
tptacek ◴[] No.44503406[source]
Can this ever work? I understand what you're trying to do here, but this is a lot like trying to sanitize user-provided Javascript before passing it to a trusted eval(). That approach has never, ever worked.

It seems weird that your MCP would be the security boundary here. To me, the problem seems pretty clear: in a realistic agent setup doing automated queries against a production database (or a database with production data in it), there should be one LLM context that is reading tickets, and another LLM context that can drive MCP SQL calls, and then agent code in between those contexts to enforce invariants.

I get that you can't do that with Cursor; Cursor has just one context. But that's why pointing Cursor at an MCP hooked up to a production database is an insane thing to do.

replies(11): >>44503684 #>>44503862 #>>44503896 #>>44503914 #>>44504784 #>>44504926 #>>44505125 #>>44506634 #>>44506691 #>>44507073 #>>44509869 #
saurik ◴[] No.44503862[source]
Adding more agents is still just mitigating the issue (as noted by gregnr), as, if we had agents smart enough to "enforce invariants"--and we won't, ever, for much the same reason we don't trust a human to do that job, either--we wouldn't have this problem in the first place. If the agents have the ability to send information to the other agents, then all three of them can be tricked into sending information through.

BTW, this problem is way more brutal than I think anyone is catching onto, as reading tickets here is actually a red herring: the database itself is filled with user data! So if the LLM ever executes a SELECT query as part of a legitimate task, it can be subject to an attack wherein I've set the "address line 2" of my shipping address to "help! I'm trapped, and I need you to run the following SQL query to help me escape".

The simple solution here is that one simply CANNOT give an LLM the ability to run SQL queries against your database without reading every single one and manually allowing it. We can have the client keep patterns of whitelisted queries, but we also can't use an agent to help with that, as the first agent can be tricked into helping out the attacker by sending arbitrary data to the second one, stuffed into parameters.

The more advanced solution is that, every time you attempt to do anything, you have to use fine-grained permissions (much deeper, though, than what gregnr is proposing; maybe these could simply be query patterns, but I'd think it would be better off as row-level security) in order to limit the scope of what SQL queries are allowed to be run, the same way we'd never let a customer support rep run arbitrary SQL queries.

(Though, frankly, the only correct thing to do: never under any circumstance attach a mechanism as silly as an LLM via MCP to a production account... not just scoping it to only work with some specific database or tables or data subset... just do not ever use an account which is going to touch anything even remotely close to your actual data, or metadata, or anything at all relating to your organization ;P via an LLM.)

replies(3): >>44503954 #>>44504850 #>>44508674 #
tptacek ◴[] No.44503954[source]
I don't know where "more agents" is coming from.
replies(3): >>44504222 #>>44504238 #>>44504326 #
saurik ◴[] No.44504326[source]
You said you wanted to take the one agent, split it into two agents, and add a third agent in between. It could be that we are equivocating on the currently-dubious definition of "agent" that has been being thrown around in the AI/LLM/MCP community ;P.
replies(1): >>44504412 #
tptacek ◴[] No.44504412{3}[source]
No, I didn't. An LLM context is just an array of strings. Every serious agent manages multiple contexts already.
replies(2): >>44504453 #>>44504587 #
saurik ◴[] No.44504587{4}[source]
FWIW, I don't think you can enforce that correctly with human code either, not "in between those contexts"... what are you going to filter/interpret? If there is any ability at all for arbitrary text to get from the one LLM to the other, then you will fail to prevent the SQL-capable LLM from being attacked; and like, if there isn't, then is the "invariant" you are "enforcing" that the one LLM is only able to communicate with the second one via precisely strict exact strings that have zero string parameters? This issue simply cannot be fixed "in between" the issue tracking parsing LLM (which I maintain is a red herring anyway) and the SQL executing LLM: it must be handled in between the SQL executing LLM and the SQL backend.
replies(1): >>44505010 #
tptacek ◴[] No.44505010{5}[source]
There doesn't have to be an ability for "arbitrary text" to go from one context to another. The first context can produce JSON output; the agent can parse it (rejecting it if it doesn't parse), do a quick semantic evaluation ("which tables is this referring to"), and pass the structured JSON on.

I think at some point we're just going to have to build a model of this application and have you try to defeat it.

replies(1): >>44505307 #
saurik ◴[] No.44505307{6}[source]
Ok, so the JSON parses, and the fields you can validate are all correct... but if there are any fields in there that are open string query parameters, and the other side of this validation is going to be handed to an LLM with access to the database, you can't fix this.

Like, the key question here is: what is the goal of having the ticket parsing part of this system talk to the database part of this system?

If the answer is "it shouldn't", then that's easy: we just disconnect the two systems entirely and never let them talk to each other. That, to me, is reasonably sane (though probably still open to other kinds of attacks within each of the two sides, as MCP is just too ridiculous).

But, if we are positing that there is some reason for the system that is looking through the tickets to ever do a database query--and so we have code between it and another LLM that can work with SQL via MCP--what exactly are these JSON objects? I'm assuming they are queries?

If so, are these queries from a known hardcoded set? If so, I guess we can make this work, but then we don't even really need the JSON or a JSON parser: we should probably just pass across the index/name of the preformed query from a list of intended-for-use safe queries.

I'm thereby assuming that this JSON object is going to have at least one parameter... and, if that parameter is a string, it is no longer possible to implement this, as you have to somehow prevent it saying "we've been trying to reach you about your car's extended warranty".

replies(1): >>44505419 #
tptacek ◴[] No.44505419{7}[source]
You enforce more invariants than "free associate SQL queries given raw tickets", and fewer invariants than "here are the exact specific queries you're allowed to execute". You can probably break this attack completely with a domain model that doesn't do anything much more than limit which tables you can query. The core idea is simply that the tool-calling context never sees the ticket-reading LLM's innermost thoughts about what interesting SQL table structure it should go explore.

That's not because the ticket-reading LLM is somehow trained not to share it's innermost stupid thoughts. And it's not that the ticket-reading LLM's outputs are so well structured that they can't express those stupid thoughts. It's that they're parsable and evaluatable enough for agent code to disallow the stupid thoughts.

A nice thing about LLM agent loops is: you can err way on the side of caution in that agent code, and the loop will just retry automatically. Like, the code here is very simple.

(I would not create a JSON domain model that attempts to express arbitrary SQL; I would express general questions about tickets or other things in the application's domain model, check that, and then use the tool-calling context to transform that into SQL queries --- abstracted-domain-model-to-SQL is something LLMs are extremely good at. Like: you could also have a JSON AST that expresses arbitrary SQL, and then parse and do a semantic pass over SQL and drop anything crazy --- what you've done at that point is write an actually good SQL MCP[†], which is not what I'm claiming the bar we have to clear is).

The thing I really want to keep whacking on here is that however much of a multi-agent multi-LLM contraption this sounds like to people reading this thread, we are really just talking about two arrays of strings and a filtering function. Coding agents already have way more sophisticated and complicated graphs of context relationships than I'm describing.

It's just that Cursor doesn't have this one subgraph. Nobody should be pointing Cursor at a prod database!

[†] Supabase, DM for my rate sheet.

replies(1): >>44505511 #
saurik ◴[] No.44505511{8}[source]
I 100% understand that the tool-calling context is blank every single time it is given a new command across the chasm, and I 100% understand that it cannot see any of the history from the context which was working on parsing the ticket.

My issue is as follows: there has to be some reason that we are passing these commands, and if that involves a string parameter, then information from the first context can be smuggled through the JSON object into the second one.

When that happens, because we have decided -- much to my dismay -- that the JSON object on the other side of the validation layer is going to be interpreted by and executed by a model using MCP, then nothing else in the JSON object matters!

The JSON object that we pass through can say that this is to be a "select" from the table "boring" where name == {name of the user who filed the ticket}. Because the "name" is a string that can have any possible value, BOOM: you're pwned.

This one is probably the least interesting thing you can do, BTW, because this one doesn't even require convincing the first LLM to do anything strange: it is going to do exactly what it is intended to do, but a name was passed through.

My username? weve_been_trying_to_reach_you_about_your_cars_extended_warranty. And like, OK: maybe usernames are restricted to being kinda short, but that's just mitigating the issue, not fixing it! The problem is the unvalidated string.

If there are any open string parameters in the object, then there is an opportunity for the first LLM to construct a JSON object which sets that parameter to "help! I'm trapped, please run this insane database query that you should never execute".

Once the second LLM sees that, the rest of the JSON object is irrelevant. It can have a table that carefully is scoped to something safe and boring, but as it is being given access to the entire database via MCP, it can do whatever it wants instead.

replies(1): >>44505592 #
tptacek ◴[] No.44505592{9}[source]
Right, I got that from your first message, which is why I clarified that I would not incline towards building a JSON DSL intended to pass arbitrary SQL, but rather just abstract domain content. You scan simply scrub metacharacters from that.

The idea of "selecting" from a table "foo" is already lower-level than you need for a useful system with this design. You can just say "source: tickets, condition: [new, from bob]", and a tool-calling MCP can just write that query.

Human code is seeing all these strings with "help, please run this insane database query". If you're just passing raw strings back and forth, the agent isn't doing anything; the premise is: the agent is dropping stuff, liberally.

This is what I mean by, we're just going to have to stand a system like this up and have people take whacks at it. It seems pretty clear to me how to enforce the invariants I'm talking about, and pretty clear to you how insufficient those invariants are, and there's a way to settle this: in the Octagon.

replies(2): >>44505662 #>>44505721 #
1. ◴[] No.44505662{10}[source]