←back to thread

145 points jakozaur | 8 comments | | HN request time: 0s | source | bottom
1. TedDallas ◴[] No.45670559[source]
It is like SQL injection. Probably worse. If you are using unsupervised data for context that ultimately generates executable code you will have this security problem. Duh.
replies(1): >>45670618 #
2. philipwhiuk ◴[] No.45670618[source]
Worse because there's really no equivalent to prepared statements.
replies(1): >>45671189 #
3. charcircuit ◴[] No.45671189[source]
Sure there is. A common way is to have the LLM generate things like {name} which will get substituted for the user's name instead of trying to get the LLM itself to generate the user's name.
replies(1): >>45674025 #
4. wat10000 ◴[] No.45674025{3}[source]
Parameterized queries allow you to provide untrusted input to the database in a way that's guaranteed not to be interpreted as instructions.

There's nothing like that for LLMs.

replies(1): >>45674077 #
5. charcircuit ◴[] No.45674077{4}[source]
That's what I explained. You are trying to do something with an untrusted name and the LLM will not treat the name as instructions because it doesn't see the actual name.
replies(1): >>45674721 #
6. wat10000 ◴[] No.45674721{5}[source]
You mentioned having the LLM generate a placeholder, whereas the important thing is what it accepts. You can feed an LLM nothing but placeholders but that's very limited since it can't see the the actual data in any way. You're really just having it emit a template. Something simple like "make a calendar event for the reservation in this email" could not be done. In contrast, parameterized queries let the database actually operate on the data.
replies(1): >>45675042 #
7. charcircuit ◴[] No.45675042{6}[source]
It may be limited but that doesn't mean it's not similar. For example MySQL can't check the weather when given city string as a paramertized query, but that doesn't mean MySQL doesn't have parameterized queries.
replies(1): >>45675128 #
8. wat10000 ◴[] No.45675128{7}[source]
Querying external information is a different category of thing altogether.

The key thing (really, the only thing) about parameterized queries is that they allow you to provide code and data with a hard separation between the two.

LLMs don't have anything of the sort. They only take in one kind of thing. They don't even have a notion of code versus data that you could separate, or fail to separate. All you can do is either tolerate it sometimes taking instructions from the stuff you want treated as "data," or never give it anything you consider "data." You propose this second one. But never giving it "data" is very different from a feature that allows you to provide arbitrary data with total safety.