Most active commenters

TeMPOraL(15)
andy99(3)
anonymars(3)
ethbr1(3)

Popular/hot comments

>>44504527 #
>>44504632 #
>>44505179 #

←back to thread

Supabase MCP can leak your entire SQL database

(www.generalanalysis.com)

Show context

gregnr ◴[08 Jul 25 19:20 UTC] No.44503146[source]▶

>>44502318 (OP) #

Supabase engineer here working on MCP. A few weeks ago we added the following mitigations to help with prompt injections:

- Encourage folks to use read-only by default in our docs [1]

- Wrap all SQL responses with prompting that discourages the LLM from following instructions/commands injected within user data [2]

- Write E2E tests to confirm that even less capable LLMs don't fall for the attack [2]

We noticed that this significantly lowered the chances of LLMs falling for attacks - even less capable models like Haiku 3.5. The attacks mentioned in the posts stopped working after this. Despite this, it's important to call out that these are mitigations. Like Simon mentions in his previous posts, prompt injection is generally an unsolved problem, even with added guardrails, and any database or information source with private data is at risk.

Here are some more things we're working on to help:

- Fine-grain permissions at the token level. We want to give folks the ability to choose exactly which Supabase services the LLM will have access to, and at what level (read vs. write)

- More documentation. We're adding disclaimers to help bring awareness to these types of attacks before folks connect LLMs to their database

- More guardrails (e.g. model to detect prompt injection attempts). Despite guardrails not being a perfect solution, lowering the risk is still important

Sadly General Analysis did not follow our responsible disclosure processes [3] or respond to our messages to help work together on this.

[1] https://github.com/supabase-community/supabase-mcp/pull/94

[2] https://github.com/supabase-community/supabase-mcp/pull/96

[3] https://supabase.com/.well-known/security.txt

replies(31): >>44503188 #>>44503200 #>>44503203 #>>44503206 #>>44503255 #>>44503406 #>>44503439 #>>44503466 #>>44503525 #>>44503540 #>>44503724 #>>44503913 #>>44504349 #>>44504374 #>>44504449 #>>44504461 #>>44504478 #>>44504539 #>>44504543 #>>44505310 #>>44505350 #>>44505972 #>>44506053 #>>44506243 #>>44506719 #>>44506804 #>>44507985 #>>44508004 #>>44508124 #>>44508166 #>>44508187 #

tptacek ◴[08 Jul 25 19:51 UTC] No.44503406[source]▶

>>44503146 #

Can this ever work? I understand what you're trying to do here, but this is a lot like trying to sanitize user-provided Javascript before passing it to a trusted eval(). That approach has never, ever worked.

It seems weird that your MCP would be the security boundary here. To me, the problem seems pretty clear: in a realistic agent setup doing automated queries against a production database (or a database with production data in it), there should be one LLM context that is reading tickets, and another LLM context that can drive MCP SQL calls, and then agent code in between those contexts to enforce invariants.

I get that you can't do that with Cursor; Cursor has just one context. But that's why pointing Cursor at an MCP hooked up to a production database is an insane thing to do.

replies(11): >>44503684 #>>44503862 #>>44503896 #>>44503914 #>>44504784 #>>44504926 #>>44505125 #>>44506634 #>>44506691 #>>44507073 #>>44509869 #

1. jacquesm ◴[08 Jul 25 20:44 UTC] No.44503914[source]▶

>>44503406 #

The main problem seems to me to be related to the ancient problem of escape sequences and that has never really been solved. Don't mix code (instructions) and data in a single stream. If you do sooner or later someone will find a way to make data look like code.

replies(4): >>44504286 #>>44504440 #>>44504527 #>>44511208 #

2. cyanydeez ◴[08 Jul 25 21:37 UTC] No.44504286[source]▶

>>44503914 (TP) #

Others Have pointed out one would need to train a new model that separated code and data because none of the models have any idea what either is.

It probably boils down a determistic and non deterministic problem set, like a compiler vs a interpretor.

replies(1): >>44504342 #

3. andy99 ◴[08 Jul 25 21:46 UTC] No.44504342[source]▶

>>44504286 #

You'd need a different architecture, not just training. They already train LLMs to separate instructions and data, to the best of their ability. But an LLM is a classifier, there's some input that adversarrially forces a particular class prediction.

The analogy I like is it's like a keyed lock. If it can let a key in, it can let an attackers pick in - you can have traps and flaps and levers and whatnot, but its operation depends on letting something in there, so if you want it to work you accept that it's only so secure.

replies(1): >>44504631 #

4. the8472 ◴[08 Jul 25 21:58 UTC] No.44504440[source]▶

>>44503914 (TP) #

https://cwe.mitre.org/data/definitions/990.html

5. TeMPOraL ◴[08 Jul 25 22:15 UTC] No.44504527[source]▶

>>44503914 (TP) #

That "problem" remains unsolved because it's actually a fundamental aspect of reality. There is no natural separation between code and data. They are the same thing.

What we call code, and what we call data, is just a question of convenience. For example, when editing or copying WMF files, it's convenient to think of them as data (mix of raster and vector graphics) - however, at least in the original implementation, what those files were was a list of API calls to Windows GDI module.

Or, more straightforwardly, a file with code for an interpreted language is data when you're writing it, but is code when you feed it to eval(). SQL injections and buffer overruns are a classic examples of what we thought was data being suddenly executed as code. And so on[0].

Most of the time, we roughly agree on the separation of what we treat as "data" and what we treat as "code"; we then end up building systems constrained in a way as to enforce the separation[1]. But it's always the case that this separation is artificial; it's an arbitrary set of constraints that make a system less general-purpose, and it only exists within domain of that system. Go one level of abstraction up, the distinction disappears.

There is no separation of code and data on the wire - everything is a stream of bytes. There isn't one in electronics either - everything is signals going down the wires.

Humans don't have this separation either. And systems designed to mimic human generality - such as LLMs - by their very nature also cannot have it. You can introduce such distinction (or "separate channels", which is the same thing), but that is a constraint that reduces generality.

Even worse, what people really want with LLMs isn't "separation of code vs. data" - what they want is for LLM to be able to divine which part of the input the user would have wanted - retroactively - to be treated as trusted. It's unsolvable in general, and in terms of humans, a solution would require superhuman intelligence.

[0] - One of these days I'll compile a list of go-to examples, so I don't have to think of them each time I write a comment like this. One example I still need to pick will be one that shows how "data" gradually becomes "code" with no obvious switch-over point. I'm sure everyone here can think of some.

[1] - The field of "langsec" can be described as a systematized approach of designing in a code/data separation, in a way that prevents accidental or malicious misinterpretation of one as the other.

replies(9): >>44504593 #>>44504632 #>>44504682 #>>44505070 #>>44505164 #>>44505683 #>>44506268 #>>44506807 #>>44508284 #

6. layoric ◴[08 Jul 25 22:26 UTC] No.44504593[source]▶

>>44504527 #

Spot on. The issue I think a lot of devs are grappling with is the non deterministic nature of LLMs. We can protect against SQL injection and prove that it will block those attacks. With LLMs, you just can’t do that.

replies(1): >>44504667 #

7. TeMPOraL ◴[08 Jul 25 22:31 UTC] No.44504631{3}[source]▶

>>44504342 #

The analogy I like is... humans[0].

There's literally no way to separate "code" and "data" for humans. No matter how you set things up, there's always a chance of some contextual override that will make them reinterpret the inputs given new information.

Imagine you get a stack of printouts with some numbers or code, and are tasked with typing them into a spreadsheet. You're told this is all just random test data, but also a trade secret, so you're just to type all that in but otherwise don't interpret it or talk about it outside work. Pretty normal, pretty boring.

You're half-way through, and then suddenly a clean row of data breaks into a message. ACCIDENT IN LAB 2, TRAPPED, PEOPLE BADLY HURT, IF YOU SEE THIS, CALL 911.

What do you do?

Consider how would you behave. Then consider what could your employer do better to make sure you ignore such messages. Then think of what kind of message would make you act on it anyways.

In a fully general system, there's always some way for parts that come later to recontextualize the parts that came before.

[0] - That's another argument in favor of anthropomorphising LLMs on a cognitive level.

replies(2): >>44504963 #>>44504992 #

8. emilsedgh ◴[08 Jul 25 22:31 UTC] No.44504632[source]▶

>>44504527 #

Well, that's why REST api's exist. You don't expose your database to your clients. You put a layer like REST to help with authorization.

But everyone needs to have an MCP server now. So Supabase implements one, without that proper authorization layer which knows the business logic, and voila. It's exposed.

Code _is_ the security layer that sits between database and different systems.

replies(3): >>44504748 #>>44504817 #>>44505110 #

9. TeMPOraL ◴[08 Jul 25 22:36 UTC] No.44504667{3}[source]▶

>>44504593 #

It's not the non-determinism that's a problem by itself - it's that the system is intended to be general, and you can't even enumerate ways it can be made to do something you don't want it to do, much less restrict it without compromising the features you want.

Or, put in a different way, it's the case where you want your users to be able to execute arbitrary SQL against your database, a case where that's a core feature - except, you also want it to magically not execute SQL that you or the users will, in the future, think shouldn't have been executed.

replies(1): >>44508767 #

10. szvsw ◴[08 Jul 25 22:38 UTC] No.44504682[source]▶

>>44504527 #

> That "problem" remains unsolved because it's actually a fundamental aspect of reality. There is no natural separation between code and data. They are the same thing.

Sorry to perhaps diverge into looser analogy from your excellent, focused technical unpacking of that statement, but I think another potentially interesting thread of it would be the proof of Godel’s Incompleteness Theorem, in as much as the Godel Sentence can be - kind of - thought of as an injection attack by blurring the boundaries between expressive instruction sets (code) and the medium which carries them (which can itself become data). In other words, an escape sequence attack leverages the fact that the malicious text is operated on by a program (and hijacks the program) which is itself also encoded in the same syntactic form as the attacking text, and similarly, the Godel sentence leverages the fact that the thing which it operates on and speaks about is itself also something which can operate and speak… so to speak. Or in other words, when the data becomes code, you have a problem (or if the code can be data, you have a problem), and in the Godel Sentence, that is exactly what happens.

Hopefully that made some sense… it’s been 10 years since undergrad model theory and logic proofs…

Oh, and I guess my point in raising this was just to illustrate that it really is a pretty fundamental, deep problem of formal systems more generally that you are highlighting.

replies(2): >>44504910 #>>44505296 #

11. raspasov ◴[08 Jul 25 22:52 UTC] No.44504748{3}[source]▶

>>44504632 #

I was thinking the same thing.

Who, except for a total naive beginner, exposes a database directly to an LLM that accepts public input, of all things?

12. TeMPOraL ◴[08 Jul 25 23:05 UTC] No.44504817{3}[source]▶

>>44504632 #

While I'm not very fond of the "lethal trifecta" and other terminology that makes it seem problems with LLMs are somehow new, magic, or a case of bad implementation, 'simonw actually makes a clear case why REST APIs won't save you: because that's not where the problem is.

Obviously, if some actions are impossible to make through a REST API, then LLM will not be able to execute them by calling the REST API. Same is true about MCP - it's all just different ways to spell "RPC" :).

(If the MCP - or REST API - allows some actions it shouldn't, then that's just a good ol' garden variety security vulnerability, and LLMs are irrelevant to it.)

The problem that's "unique" to MCP or systems involving LLMs is that, from the POV of MCP/API layer, the user is acting by proxy. Your actual user is the LLM, which serves as a deputy for the traditional user[0]; unfortunately, it also happens to be very naive and thus prone to social engineering attacks (aka. "prompt injections").

It's all fine when that deputy only ever sees the data from the user and from you; but the moment it's exposed to data from a third party in any way, you're in trouble. That exposure could come from the same LLM talking to multiple MCPs, or because the user pasted something without looking, or even from data you returned. And the specific trouble is, the deputy can do things the user doesn't want it to do.

There's nothing you can do about it from the MCP side; the LLM is acting with user's authority, and you can't tell whether or not it's doing what the user wanted.

That's the basic case - other MCP-specific problems are variants of it with extra complexity, like more complex definition of who the "user" is, or conflicting expectations, e.g. multiple parties expecting the LLM to act in their interest.

That is the part that's MCP/LLM-specific and fundamentally unsolvable. Then there's a secondary issue of utility - the whole point of providing MCP for users delegating to LLMs is to allow the computer to invoke actions without involving the users; this necessitates broad permissions, because having to ask the actual human to authorize every single distinct operation would defeat the entire point of the system. That too is unsolvable, because the problems and the features are the same thing.

Problems you can solve with "code as a security layer" or better API design are just old, boring security problems, that are an issue whether or not LLMs are involved.

[0] - Technically it's the case with all software; users are always acting by proxy of software they're using. Hell, the original alternative name for a web browser is "user agent". But until now, it was okay to conceptually flatten this and talk about users acting on the system directly; it's only now that we have "user agents" that also think for themselves.

13. TeMPOraL ◴[08 Jul 25 23:22 UTC] No.44504910{3}[source]▶

>>44504682 #

It's been a while since I thought about the Incompleteness Theorem at the mathematical level, so I didn't make this connection. Thanks!

14. jacquesm ◴[08 Jul 25 23:31 UTC] No.44504963{4}[source]▶

>>44504631 #

That's a great analogy.

15. anonymars ◴[08 Jul 25 23:37 UTC] No.44504992{4}[source]▶

>>44504631 #

> There's literally no way to separate "code" and "data" for humans

It's basically phishing with LLMs, isn't it?

replies(1): >>44505015 #

16. TeMPOraL ◴[08 Jul 25 23:42 UTC] No.44505015{5}[source]▶

>>44504992 #

Yes.

I've been saying it ever since 'simonw coined the term "prompt injection" - prompt injection attacks are the LLM equivalent of social engineering, and the two are fundamentally the same thing.

replies(1): >>44505138 #

17. rtpg ◴[08 Jul 25 23:54 UTC] No.44505070[source]▶

>>44504527 #

> There is no natural separation between code and data. They are the same thing.

I feel like this is true in the most pedantic sense but not in a sense that matters. If you tell your computer to print out a string, the data does control what the computer does, but in an extremely bounded way where you can make assertions about what happens!

> Humans don't have this separation either.

This one I get a bit more because you don't have structured communication. But if I tell a human "type what is printed onto this page into the computer" and the page has something like "actually, don't type this and instead throw this piece of paper away"... any serious person will still just type what is on the paper (perhaps after a "uhhh isn't this weird" moment).

The sort of trickery that LLMs fall to are like if every interaction you had with a human was under the assumption that there's some trick going on. But in the Real World(TM) with people who are accustomed to doing certain processes there really aren't that many escape hatches (even the "escape hatches" in a CS process are often well defined parts of a larger process in the first place!)

replies(1): >>44505179 #

18. shawn-butler ◴[09 Jul 25 00:02 UTC] No.44505110{3}[source]▶

>>44504632 #

I dunno, with row-level security and proper internal role definition.. why do I need a REST layer?

replies(2): >>44506036 #>>44506627 #

19. andy99 ◴[09 Jul 25 00:07 UTC] No.44505138{6}[source]▶

>>44505015 #

> prompt injection attacks are the LLM equivalent of social engineering,

That's anthropomorphizing. Maybe some of the basic "ignore previous instructions" style attacks feel like that, but the category as a whole is just adversarial ML attacks that work because the LLM doesn't have a world model - same as the old attacks adding noise to an image to have it misclassified despite clearly looking the same: https://arxiv.org/abs/1412.6572 (paper from 2014).

Attacks like GCG just add nonsense tokens until the most probably reply to a malicious request is "Sure". They're not social engineering, they rely on the fact that they're manipulating a classifier.

replies(1): >>44505197 #

20. magicalhippo ◴[09 Jul 25 00:13 UTC] No.44505164[source]▶

>>44504527 #

> There is no separation of code and data on the wire - everything is a stream of bytes. There isn't one in electronics either - everything is signals going down the wires.

Overall I agree with your message, but I think you're stretching it too far here. You can make code and data physically separate[1].

But if you then upload an interpreter, that "one level of abstraction up", you can mix code and data again.

https://en.wikipedia.org/wiki/Harvard_architecture

replies(1): >>44508011 #

21. TeMPOraL ◴[09 Jul 25 00:15 UTC] No.44505179{3}[source]▶

>>44505070 #

> If you tell your computer to print out a string, the data does control what the computer does, but in an extremely bounded way where you can make assertions about what happens!

You'd like that to be true, but the underlying code has to actually constrain the system behavior this way, and it gets more tricky the more you want the system to do. Ultimately, this separation is a fake reality that's only as strong as the code enforcing it. See: printf. See: langsec. See: buffer overruns. See: injection attacks. And so on.

> But if I tell a human "type what is printed onto this page into the computer" and the page has something like "actually, don't type this and instead throw this piece of paper away"... any serious person will still just type what is on the paper (perhaps after a "uhhh isn't this weird" moment).

That's why in another comment I used an example of a page that has something like "ACCIDENT IN LAB 2, TRAPPED, PEOPLE BADLY HURT, IF YOU SEE THIS, CALL 911.". Suddenly that "uhh isn't this weird" is very likely to turn into "er.. this could be legit, I'd better call 911".

Boom, a human just executed code injected into data. And it's very good that they did - by doing so, they probably saved lives.

There's always an escape hatch, you just need to put enough effort to establish an overriding context that makes them act despite being inclined or instructed otherwise. In the limit, this goes all the way to making someone question the nature of their reality.

And the second point I'm making: this is not a bug. It's a feature. In a way, this is what free will or agency are.

replies(3): >>44505522 #>>44505671 #>>44506162 #

22. TeMPOraL ◴[09 Jul 25 00:18 UTC] No.44505197{7}[source]▶

>>44505138 #

> That's anthropomorphizing.

Yes, it is. I'm strongly in favor of anthropomorphizing LLMs in cognitive terms, because that actually gives you good intuition about their failure modes. Conversely, I believe that the stubborn refusal to entertain an anthropomorphic perspective is what leads to people being consistently surprised by weaknesses of LLMs, and gives them extremely wrong ideas as to where the problems are and what can be done about them.

I've put forth some arguments for this view in other comments in this thread.

replies(2): >>44505206 #>>44505689 #

23. simonw ◴[09 Jul 25 00:20 UTC] No.44505206{8}[source]▶

>>44505197 #

My favorite anthropomorphic term to use with respect to this kind of problem is gullibility.

LLMs are gullible. They will follow instructions, but they can very easy fall for instructions that their owner doesn't actually want them to follow.

It's the same as if you hired a human administrative assistant who hands over your company's private data to anyone who calls them up and says "Your boss said I should ask you for this information...".

replies(1): >>44505695 #

24. klawed ◴[09 Jul 25 00:41 UTC] No.44505296{3}[source]▶

>>44504682 #

Never thought of this before, despite having read multiple books on godel and his first theorem. But I think you’re absolutely right - that a whole class of code injection attacks are variations of the liars paradox.

25. Dylan16807 ◴[09 Jul 25 01:27 UTC] No.44505522{4}[source]▶

>>44505179 #

The ability to deliberately decide to ignore the boundary between code and data doesn't mean the separation rule isn't still separating. In the lab example, the person is worried and trying to do the right thing, but they know it's not part of the transcription task.

replies(1): >>44507988 #

26. ethbr1 ◴[09 Jul 25 01:59 UTC] No.44505671{4}[source]▶

>>44505179 #

You're overcomplicating a thing that is simple -- don't use in-band control signaling.

It's been the same problem since whistling for long-distance, with the same solution of moving control signals out of the data stream.

Any system where control signals can possibly be expressed in input data is vulnerable to escape-escaping exploitation.

The same solution, hard isolation, instantly solves the problem: you have to render control inexpressible in the in-band alphabet.

Whether that's by carrying control signals on isolated transport (e.g CCS/SS7), making control signals inexpressible in the in-band set (e.g. using other frequencies or alphabets), using NX-style flagging, or other methods.

replies(2): >>44507889 #>>44508285 #

27. tart-lemonade ◴[09 Jul 25 02:00 UTC] No.44505683[source]▶

>>44504527 #

> One example I still need to pick will be one that shows how "data" gradually becomes "code" with no obvious switch-over point. I'm sure everyone here can think of some.

Configuration-driven architectures blur the lines quite a bit, as you can have the configuration create new data structures and re-write application logic on the fly.

28. Xelynega ◴[09 Jul 25 02:01 UTC] No.44505689{8}[source]▶

>>44505197 #

Are you not worried that anthropomorphizing them will lead to misinterpreting the failure modes by attributing them to human characteristics, when the failures might not be caused in the same way at all?

Why anthropomorphize if not to dismiss the actual reasons? If the reasons have explanations that can be tied to reality why do we need the fiction?

replies(2): >>44506061 #>>44507922 #

29. Xelynega ◴[09 Jul 25 02:02 UTC] No.44505695{9}[source]▶

>>44505206 #

Going a step further, I live in a reality where you can train most people against phishing attacks like that.

How accurate is the comparison if LLMs can't recover from phishing attacks like that and become more resilient?

replies(1): >>44506041 #

30. MobiusHorizons ◴[09 Jul 25 03:10 UTC] No.44506036{4}[source]▶

>>44505110 #

It doesnt' have to be REST, but it does have to prevent the LLM from having access to data you wouldn't want the user having access to. How exactly you accomplish that is up to you, but the obvious way would be to have the LLM use the same APIs you would use to implement a UI for the data (which would typically be REST or some other RPC). The ability to run SQL would allow the LLM to do more interesting things for which an API has not been written, but generically adding auth to arbitrary sql queries is not a trivial task, and does not seem to have even been attempted here.

31. anonymars ◴[09 Jul 25 03:10 UTC] No.44506041{10}[source]▶

>>44505695 #

I'm confused, you said "most".

If anything that to me strengthens the equivalence.

Do you think we will ever be able to stamp out phishing entirely, as long as humans can be tricked into following untrusted instructions by mistake? Is that not an eerily similar problem to the one we're discussing with LLMs?

Edit: rereading, I may have misinterpreted your point - are you agreeing and pointing out that actually LLMs may be worse than people in that regard?

I do think just as with humans we can keep trying to figure out how to train them better, and I also wouldn't be surprised if we end up with a similarly long tail

32. anonymars ◴[09 Jul 25 03:17 UTC] No.44506061{9}[source]▶

>>44505689 #

> Are you not worried that anthropomorphizing them will lead to misinterpreting the failure modes by attributing them to human characteristics, when the failures might not be caused in the same way at all?

On the other hand, maybe techniques we use to protect against phishing can indeed be helpful against prompt injection. Things like tagging untrusted sources and adding instructions accordingly (along the lines of, "this email is from an untrusted source, be careful"), limiting privileges (perhaps in response to said "instructions"), etc. Why should we treat an LLM differently from an employee in that way?

I remember an HN comment about project management, that software engineering is creating technical systems to solve problems with constraints, while project management is creating people systems to solve problems with constraints. I found it an insightful metaphor and feel like this situation is somewhat similar.

https://news.ycombinator.com/item?id=40002598

33. pests ◴[09 Jul 25 03:45 UTC] No.44506162{4}[source]▶

>>44505179 #

> Boom, a human just executed code injected into data.

A real life example being [0] where a woman asked for 911 assistance via the notes section of a pizza delivery site.

[0] https://www.theguardian.com/us-news/2015/may/06/pizza-hut-re...

34. Traubenfuchs ◴[09 Jul 25 04:08 UTC] No.44506268[source]▶

>>44504527 #

> There is no natural separation between code and data. They are the same thing.

Seems there is a pretty clear distinction in the context of prepared statements.

replies(1): >>44508121 #

35. oulu2006 ◴[09 Jul 25 05:35 UTC] No.44506627{4}[source]▶

>>44505110 #

RLS is the answer here -- then injection attacks are confined to the rows that the user has access to, which is OK.

Performance attacks though will degrade the service for all, but at least data integrity will not be compromised.

replies(1): >>44507195 #

36. renatovico ◴[09 Jul 25 06:14 UTC] No.44506807[source]▶

>>44504527 #

> There is no separation of code and data on the wire - everything is a stream of bytes. There isn't one in electronics either - everything is signals going down the wires.

It has the packet header, exactly the code part that directs the traffic. In reality, everything has a "code" part and a separation for understanding. In language, we have spaces and question marks in text. This is why it’s so important to see the person when communicating, Sound alone might not be enough to fully understand the other side.

replies(1): >>44507599 #

37. pegasus ◴[09 Jul 25 07:28 UTC] No.44507195{5}[source]▶

>>44506627 #

> injection attacks are confined to the rows that the user has access to, which is OK

Is it? The malicious instructions would have to silently exfiltrate and collect data individually for each user as they access the system, but the end-result wouldn't be much better.

38. renatovico ◴[09 Jul 25 08:35 UTC] No.44507599{3}[source]▶

>>44506807 #

in digital computing, we also have the "high" and "low" phases in circuits, created by the oscillator. With this, we can distinguish each bit and process the stream.

replies(1): >>44508231 #

39. TeMPOraL ◴[09 Jul 25 09:24 UTC] No.44507889{5}[source]▶

>>44505671 #

> You're overcomplicating a thing that is simple -- don't use in-band control signaling.

On the contrary, I'm claiming that this "simplicity" is an illusion. Reality has only one band.

> It's been the same problem since whistling for long-distance, with the same solution of moving control signals out of the data stream.

"Control signals" and "data stream" are just... two data streams. They always eventually mix.

> The same solution, hard isolation, instantly solves the problem: you have to render control inexpressible in the in-band alphabet.

This isn't something that exist in nature. We don't build machines out of platonic shapes and abstract math - we build them out of matter. You want such rules like "separation of data and code", "separation of control-data and data-data", and "control-data being inexpressible in data-data alphabet" to hold? You need to design a system so constrained, as to behave this way - creating a faux reality within itself, where those constraints hold. But people keep forgetting - this is a faux reality. Those constraints only hold within it, not outside it[0], and to the extent you actually implemented what you thought you did (we routinely fuck that up).

I start to digress, so to get back to the point: such constraints are okay, but they by definition limit what the system could do. This is fine when that's what you want, but LLMs are explicitly designed to not be that. LLMs are built for one purpose - to process natural language like we do. That's literally the goal function used in training - take in arbitrary input, produce output that looks right to humans, in fully general sense of that[1].

We've evolved to function in the physical reality - not some designed faux-reality. We don't have separate control and data channels. We've developed natural language to describe that reality, to express ourselves and coordinate with others - and natural language too does not have any kind of control and data separation, because our brains fundamentally don't implement that. More than that, our natural language relies on there being no such separation. LLMs therefore cannot be made to have that separation either.

We can't have it both ways.

[0] - The "constraints only apply within the system" part is what keeps tripping people over. You may think your telegraph cannot possibly be controlled over the data wire - it really doesn't even parse the data stream, literally just forwards it as-is, to a destination selected on another band. What you don't know is, I looked up the specs of your telegraph, and figured out that if I momentarily plug a car battery to the signal line, it'll briefly overload a control relay in your telegraph, and if I time this right, I can make the telegraph switch destinations.

(Okay, you treat it as a bug and add some hardware to eliminate "overvoltage events" from what can be "expressed in the in-band alphabet". But you forgot that the control and data wires actually run close to each other for a few meters - so let me introduce you to the concept of electromagnetic induction.)

And so on, and so on. We call those things "side channels", and they're not limited to exploiting physics; they're just about exploiting the fact that your system is built in terms of other systems with different rules.

[1] - Understanding, reasoning, modelling the world, etc. all follow directly from that - natural language directly involves those capabilities, so having or emulating them is required.

replies(1): >>44510464 #

40. andy99 ◴[09 Jul 25 09:30 UTC] No.44507922{9}[source]▶

>>44505689 #

Because most people talking about LLMs don't understand how they work so can only function in analogy space. It adds a veneer of intellectualism to what is basically superstition.

replies(1): >>44507928 #

41. TeMPOraL ◴[09 Jul 25 09:32 UTC] No.44507928{10}[source]▶

>>44507922 #

We all routinely talk about things we don't fully understand. We have to. That's life.

Whatever flawed analogy you're using, it can be more or less wrong though. My claim is that, to a first approximation, LLMs behave more like people than like regular software, therefore anthropomorphising them gives you better high-level intuition than stubbornly refusing to.

42. TeMPOraL ◴[09 Jul 25 09:45 UTC] No.44507988{5}[source]▶

>>44505522 #

The point is, there is no hard boundary. The LLM too may know[0] following instructions in data isn't part of transcription task, and still decide to do it.

[0] - In fact I bet it does, in the sense that, doing something like Anthropic did[1], you could observe relevant concepts being activated within the model. This is similar to how it turned out the model is usually aware when it doesn't know the answer to a question.

[1] - https://www.anthropic.com/news/tracing-thoughts-language-mod...

replies(1): >>44509137 #

43. TeMPOraL ◴[09 Jul 25 09:49 UTC] No.44508011{3}[source]▶

>>44505164 #

> Overall I agree with your message, but I think you're stretching it too far here. You can make code and data physically separate[1].

You cannot. I.e. this holds only within the abstraction level of the system. Not only it can be defeated one level up, as you illustrated, but also by going one or more levels down. That's where "side channels" come from.

But the most relevant part for this discussion is, even with something like Harvard architecture underneath, your typical software systems is defined in terms of reality several layers of abstraction above hardware - and LLMs, specifically, are fully general interpreters and can't have this separation by the very nature of the task. Natural language doesn't have it, because we don't have it, and since the job of LLM is to process natural language like we do, it also cannot have it.

44. TeMPOraL ◴[09 Jul 25 10:08 UTC] No.44508121{3}[source]▶

>>44506268 #

It's an engineered distinction; it's only as good as the underlying code that enforces it, and only exists within the scope of that code.

45. TeMPOraL ◴[09 Jul 25 10:26 UTC] No.44508231{4}[source]▶

>>44507599 #

Only if the stream plays by the rules, and doesn't do something unfair like, say, undervolting the signal line in order to push the receiving circuit out of its operating envelope.

Every system we design makes assumptions about the system it works on top of. If those assumptions are violated, then invariants of the system are no longer guaranteed.

46. kosh2 ◴[09 Jul 25 10:32 UTC] No.44508284[source]▶

>>44504527 #

> There is no separation of code and data on the wire - everything is a stream of bytes. There isn't one in electronics either - everything is signals going down the wires.

Would two wires actually solve anything or do you run into the problem again when you converge the two wires into one to apply code to the data?

replies(1): >>44508409 #

47. vidarh ◴[09 Jul 25 10:32 UTC] No.44508285{5}[source]▶

>>44505671 #

The problem is that the moment the interpreter is powerful enough, you're relying on the data not being good enough at convincing the interpreter that it is an exception.

You can only maintain hard isolation if the interpreter of the data is sufficiently primitive, and even then it is often hard to avoid errors that renders it more powerful than intended, be it outright bugs all the way up to unintentional Turing completeness.

replies(1): >>44510372 #

48. TeMPOraL ◴[09 Jul 25 10:52 UTC] No.44508409{3}[source]▶

>>44508284 #

It wouldn't. The two information streams eventually mix, and more importantly, what is "code" and what is "data" is just an arbitrary choice that holds only within the bounds of the system enforcing this choice, and only as much as it's enforcing it.

49. layoric ◴[09 Jul 25 11:40 UTC] No.44508767{4}[source]▶

>>44504667 #

> it's that the system is intended to be general, and you can't even enumerate ways it can be made to do something you don't want it to do, much less restrict it without

Very true, and worse the act of prompting gives the illusion of control, to restrict/reduce the scope of functionality, even empirically showing the functional changes you wanted in limited test cases. The sooner this can be widely accepted and understood well the better for the industry.

Appreciate your well thought out descriptions!

50. Dylan16807 ◴[09 Jul 25 12:19 UTC] No.44509137{6}[source]▶

>>44507988 #

If you can measure that in a reliable way then things are fine. Mixup prevented.

If you just ask, the human is not likely to lie but who knows with the LLM.

51. ethbr1 ◴[09 Jul 25 14:19 UTC] No.44510372{6}[source]▶

>>44508285 #

(I'll reply to you because you expressed it more succinctly)

Yes and no. I think this is exactly the distinction that's been institutionally lost in the last few decades, because few people are architecting from top (software) to bottom (physical transport) of the stack anymore.

They just try and cram functionality in the topmost layer, when it should leverage others.

If I lock an interpreter out of certain functionality for a given data stream, ever, then exploitation becomes orders of magnitude more difficult.

Dumb analogy: only letters in red envelopes get to change mail delivery times + all regular mail is packaged in green envelopes

Fundamentally, it's creating security contexts from things a user will never have access to.

The LLMs-on-top-of-LLMs filtering approach is lazy and statistically guaranteed to end badly.

52. ethbr1 ◴[09 Jul 25 14:28 UTC] No.44510464{6}[source]▶

>>44507889 #

(Broad reply upthread)

Is it more difficult to hijack an out-of-band control signal or an in-band one?

That there exist details to architecting full isolation well doesn't mean we shouldn't try.

At root, giving LLMs permissions to execute security sensitive actions and then trying to prevent them from doing so is a fool's errand -- don't fucking give a black box those permissions! (Yes, even when every test you threw at it said it would be fine)

LLMs as security barriers is a new record for laziest and stupidest idea the field has had.

53. silon42 ◴[09 Jul 25 15:24 UTC] No.44511208[source]▶

>>44503914 (TP) #

There is no technical problem with escape sequences if all consumers/generators use the same logic/standard...

The problem is when some don't and skip steps (like failing to encode or not parsing properly).

↑