←back to thread

780 points rexpository | 1 comments | | HN request time: 0.2s | source
1. egozverev ◴[] No.44507977[source]
Academic researcher here working on this exact issue. Prompt engineering methods are no sufficient to address the challenge. People in Academy and Industry labs are aware of the issue and actively working on it, see for instance:

[1] Camel: work by google deepmind on how to (provably!) prevent agent planner from being prompt-injected: https://github.com/google-research/camel-prompt-injection

[2] FIDES: similar idea by Microsoft, formal guarantees: https://github.com/microsoft/fides

[3] ASIDE: marking non-executable parts of input and rotating their embedding by 90 degrees to defend against prompt injections: https://github.com/egozverev/aside

[4] CachePrune: pruning attention matrices to remove "instruction activations" on prompt injections: https://arxiv.org/abs/2504.21228

[5] Embedding permission tokens and inserting them to prompts: https://arxiv.org/abs/2503.23250

Here's (our own) paper discussing why prompt based methods are not going to work to solve the issue: "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" https://arxiv.org/abs/2403.06833

Do not rely on prompt engineering defenses!