←back to thread

145 points jakozaur | 3 comments | | HN request time: 0s | source
Show context
xcf_seetan ◴[] No.45670626[source]
>attackers can exploit local LLMs

I thought that local LLMs means they run on local computers, without being exposed to the internet.

If an attacker can exploit a local LLM, means it already compromised you system and there are better things they can do than trick the LLM to get what they can get directly.

replies(4): >>45670663 #>>45671212 #>>45671663 #>>45672038 #
1. SAI_Peregrinus ◴[] No.45671663[source]
LLMs don't have any distinction between instructions & data. There's no "NX" bit. So if you use a local LLM to process attacker-controlled data, it can contain malicious instructions. This is what Simon Willson's "prompt injection" means: attackers can inject a prompt via the data input. If the LLM can run commands (i.e. if it's an "agent") then prompt injection implies command execution.
replies(2): >>45671702 #>>45671736 #
2. tintor ◴[] No.45671702[source]
NX bit doesn’t work for LLMs. Data and instruction tokens are mixed up in higher layers and NX bit is lost.
3. DebtDeflation ◴[] No.45671736[source]
>LLMs don't have any distinction between instructions & data

And this is why prompt injection really isn't a solvable problem on the LLM side. You can't do the equivalent of (grep -i "DROP TABLE" form_input). What you can do is not just blindly execute LLM generated code.