(quesma.com)

145 points jakozaur | 3 comments | 22 Oct 25 12:48 UTC | HN request time: 0s | source

Show context

xcf_seetan ◴[22 Oct 25 15:29 UTC] No.45670626[source]▶

>attackers can exploit local LLMs

I thought that local LLMs means they run on local computers, without being exposed to the internet.

If an attacker can exploit a local LLM, means it already compromised you system and there are better things they can do than trick the LLM to get what they can get directly.

replies(4): >>45670663 #>>45671212 #>>45671663 #>>45672038 #

1. SAI_Peregrinus ◴[22 Oct 25 16:35 UTC] No.45671663[source]▶

>>45670626 #

LLMs don't have any distinction between instructions & data. There's no "NX" bit. So if you use a local LLM to process attacker-controlled data, it can contain malicious instructions. This is what Simon Willson's "prompt injection" means: attackers can inject a prompt via the data input. If the LLM can run commands (i.e. if it's an "agent") then prompt injection implies command execution.

replies(2): >>45671702 #>>45671736 #

2. tintor ◴[22 Oct 25 16:39 UTC] No.45671702[source]▶

>>45671663 (TP) #

NX bit doesn’t work for LLMs. Data and instruction tokens are mixed up in higher layers and NX bit is lost.

3. DebtDeflation ◴[22 Oct 25 16:41 UTC] No.45671736[source]▶

>>45671663 (TP) #

>LLMs don't have any distinction between instructions & data

And this is why prompt injection really isn't a solvable problem on the LLM side. You can't do the equivalent of (grep -i "DROP TABLE" form_input). What you can do is not just blindly execute LLM generated code.

↑

The security paradox of local LLMs