- Authority assertion
- False urgency
- Technical legitimacy
- Security theater
Prompt injection here is like a phishing campaign against an entity with no consciousness or ability to stop and question through self-reflection. - Authority assertion
- False urgency
- Technical legitimacy
- Security theater
Prompt injection here is like a phishing campaign against an entity with no consciousness or ability to stop and question through self-reflection.The current problem is that making the models resistant to "persona" injection is in opposition to much of how the models are also used conversationally. I think this is why you'll end up with hardened "agent" models and then more open conversational models.
I suppose it is also possible that the models can have an additional non-prompt context applied that sets expectations, but that requires new architecture for those inputs.
Both trick a privileged actor into doing something the user didn't intend using inputs the system trusts.
In this case, a malicious PDF that uses prompt-injection to get a Notion agent (which already has access to your workspace) to call an external web-tool and exfiltrate page content. Tjhis is simialr to CSRF's core idea - an attacker causes an authenticated principal to make a request - except here the "principal" is an autonomous agent with tool access rather than the browser carrying cookies.
Thus, same abuse-of-privilege pattern, just with different technical surface (prompt-injection + tool chaining vs. forged browser HTTP requests).
Any distinctions inside the document involve the land of statistical patterns and weights, rather than hard auditable logic.