The article gives a PDF document as an example, but depending on how links are opened and stored for Notion agents, threat actors could serve a different web page depending on the crawler/browser agent.

That means any industry-known documentation that seems good for bookmarking can be a good target.

7. lacoolj ◴[19 Sep 25 23:20 UTC] No.45307967[source]▶

>>45307095 (OP) #

This attack was demonstrated a couple years ago, it's not really a new thing.

https://simonwillison.net/2023/Oct/14/multi-modal-prompt-inj...

replies(2): >>45309716 #>>45311544 #

8. tadfisher ◴[19 Sep 25 23:43 UTC] No.45308140[source]▶

>>45307095 (OP) #

Is anyone working on the instruction/data-conflation problem? We're extremely premature in hooking up LLMs to real data sources and external functions if we can't keep them from following instructions in the data. Notion in particular shows absolutely zero warnings to end users, and encourages them to connect GitHub, GMail, Jira, etc. to the model. At this point it's basically criminal to treat this as a feature of a secure product.

replies(4): >>45308229 #>>45309698 #>>45310081 #>>45310871 #

9. abirag ◴[19 Sep 25 23:54 UTC] No.45308229[source]▶

>>45308140 #

Hey, I’m the author of this exploit. At CodeIntegrity.ai, we’ve built a platform that visualizes each of the control flows and data flows of an agentic AI system connected to tools to accurately assess each of the risks. We also provide runtime guardrails that give control over each of these flows based on your risk tolerance.

Feel free to email me at abi@codeintegrity.ai — happy to share more

10. chanw ◴[20 Sep 25 00:01 UTC] No.45308282[source]▶

>>45307095 (OP) #

This was a great article, because it demonstrated the vuln in a practical way and wasn't overly technical either. Thanks for sharing

11. memothon ◴[20 Sep 25 00:27 UTC] No.45308473[source]▶

>>45307850 #

Lots of companies have automations with Zapier etc. to upload things like invoices or other documents directly to notion. Or someone gets emailed a document with an exploit and they upload it.

12. PokestarFan ◴[20 Sep 25 00:35 UTC] No.45308540{3}[source]▶

>>45307876 #

If I had to describe it, Notion is if somehow managed to combine OneNote and Excel. Of interest is the fact that the "database" system stores each row as a page with the column values other than title stored in a special way. Of course, this also means that it doesn't scale at all, but I have seen some crazy use cases (an example is replacing Jira).

13. filearts ◴[20 Sep 25 01:04 UTC] No.45308754[source]▶

>>45307095 (OP) #

It is fascinating how similar the prompt construction was to a phishing campaign in terms of characteristics.

  - Authority assertion
  - False urgency
  - Technical legitimacy
  - Security theater

Prompt injection here is like a phishing campaign against an entity with no consciousness or ability to stop and question through self-reflection.

replies(2): >>45309747 #>>45310870 #

14. simonw ◴[20 Sep 25 01:10 UTC] No.45308804[source]▶

>>45307850 #

In this case by emailing you a PDF with a convincing title that you might want to share with your coworkers - the malicious instructions are hidden as white text on a white background.

There are plenty of other possibilities though, especially once you start booking up MCPs that can see public issue trackers or incoming emails.

15. mcapodici ◴[20 Sep 25 02:33 UTC] No.45309698[source]▶

>>45308140 #

The way you worded tbat is good and got me thinking.

What if instead of just lots of text fed to an LLM we have a data structure with trusted and untrusted data.

Any response on a call to a web search or MCP is considered untrusted by default (tunable if you also wrote the MCP and trust it).

The you limit tbe operations on untrusted data to pure transformations, no side effects.

E.g. run an LLM to summarize, or remove whitespace, convert to float etc. All these done in a sandbox without network access.

For example:

"Get me all public github issues on this repo, summarise and store in this DB."

Although the command reads public information untrusted and has DB access it will only process the untrusted information in a tight sandbox and so this can be done securely. I think!

16. judge2020 ◴[20 Sep 25 02:36 UTC] No.45309716[source]▶

>>45307967 #

The problem is that this was a vulnerability in Notion without any mitigations or safeguards against it.

17. XenophileJKO ◴[20 Sep 25 02:39 UTC] No.45309747[source]▶

>>45308754 #

I'm fairly convinced that with the right training.. the ability of the LLM to be "skeptical" and resilient to these kinds of attacks will be pretty robust.

The current problem is that making the models resistant to "persona" injection is in opposition to much of how the models are also used conversationally. I think this is why you'll end up with hardened "agent" models and then more open conversational models.

I suppose it is also possible that the models can have an additional non-prompt context applied that sets expectations, but that requires new architecture for those inputs.

replies(1): >>45309999 #

18. BrenBarn ◴[20 Sep 25 03:18 UTC] No.45309997[source]▶

>>45307095 (OP) #

It's hard to call any vulnerability "hidden" when it occurs in a tool that openly claims to be "AI".

19. BarryMilo ◴[20 Sep 25 03:18 UTC] No.45309999{3}[source]▶

>>45309747 #

Isn't the whole problem that it's nigh-impossible to isolate context from input?

replies(1): >>45311690 #

20. simonw ◴[20 Sep 25 03:33 UTC] No.45310081[source]▶

>>45308140 #

We've been talking about this problem for three years and there's not been much progress in finding a robust solution.

Current models have a separation between system prompts and user-provided prompts and are trained to follow one more than the other, but it's not bulletproof-proof - a suitably determined attacker can always find an attack that can override the system instructions.

So far the most convincing mitigation I've seen is still the DeepMind CaMeL paper, but it's very intrusive in terms of how it limits what you can build: https://simonwillison.net/2025/Apr/11/camel/

replies(1): >>45311555 #

21. furyofantares ◴[20 Sep 25 03:41 UTC] No.45310124[source]▶

>>45307095 (OP) #

> The "lethal trifecta," as described by Simon Willison, is the combination of LLM agents, tool access, and long-term memory that together enable powerful but easily exploitable attack vectors.

This is a terrible description of the lethal trifecta, it lists 3 things but they are not the trifecta. The trifecta happens to be contained in the things listed in this (and other) examples but it's stated as if the trifecta is listed here, when it is not.

The trifecta is: access to your private data, exposure to untrusted content, and the ability to externally communicate. Web search as tool for an LLM agent is both exposure to untrusted content and the ability to externally communicate.

replies(3): >>45310342 #>>45310512 #>>45310722 #

22. swyx ◴[20 Sep 25 04:23 UTC] No.45310342[source]▶

>>45310124 #

yeah TFA gets it wrong. source: https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/

replies(1): >>45310351 #

23. gnabgib ◴[20 Sep 25 04:25 UTC] No.45310351{3}[source]▶

>>45310342 #

This post started there https://news.ycombinator.com/item?id=45307452 .. yes a different link, but this was originally linked to a simonw tweet, and he linked elsewhere.

24. empiko ◴[20 Sep 25 04:55 UTC] No.45310512[source]▶

>>45310124 #

In my opinion, the trifecta can be reduced further to a simple statement: an attacker who can input into your LLM can control all its resources.

25. Kevcmk ◴[20 Sep 25 05:40 UTC] No.45310722[source]▶

>>45310124 #

This isn’t the trifecta.

It’s:

* Untrusted input

* Privileged access

* Exfiltration vector

26. freakynit ◴[20 Sep 25 06:10 UTC] No.45310870[source]▶

>>45308754 #

Pretty similar in spirit to CSRF:

Both trick a privileged actor into doing something the user didn't intend using inputs the system trusts.

In this case, a malicious PDF that uses prompt-injection to get a Notion agent (which already has access to your workspace) to call an external web-tool and exfiltrate page content. Tjhis is simialr to CSRF's core idea - an attacker causes an authenticated principal to make a request - except here the "principal" is an autonomous agent with tool access rather than the browser carrying cookies.

Thus, same abuse-of-privilege pattern, just with different technical surface (prompt-injection + tool chaining vs. forged browser HTTP requests).

27. jrm4 ◴[20 Sep 25 06:10 UTC] No.45310871[source]▶

>>45308140 #

Is anyone working on the "allowing non-root users to run executable code" problem?

well then

28. freakynit ◴[20 Sep 25 06:33 UTC] No.45310975[source]▶

>>45307850 #

Google "best free notion marketing templates" and then import them. I have done them multiple times, and so does 1000's of others woldwide.

29. freakynit ◴[20 Sep 25 06:35 UTC] No.45310985{3}[source]▶

>>45307876 #

Notion is like the "dump-truck" of everything lol.

30. jhealy ◴[20 Sep 25 08:32 UTC] No.45311544[source]▶

>>45307967 #

Not really a new vulnerability, and yet Notion just shipped it this week. All caution thrown to the wind in the name of an announce-able AI feature

31. proto-n ◴[20 Sep 25 08:34 UTC] No.45311555{3}[source]▶

>>45310081 #

I really don't see why it's not possible to just use basically a "highlighter" token which is added to all the authoritative instructions and not to data. Should be very fast for the model to learn it during rlhf or similar.

32. Terr_ ◴[20 Sep 25 09:09 UTC] No.45311690{4}[source]▶

>>45309999 #

Yeah, ultimately the LLM is guess_what_could_come_next(document) in a loop with some I/O either doing something with the latest guess or else appending more content to the document from elsewhere.

Any distinctions inside the document involve the land of statistical patterns and weights, rather than hard auditable logic.

↑