Hidden risk in Notion 3.0 AI agents: Web search tool abuse for data exfiltration

(www.codeintegrity.ai)

156 points abirag | 4 comments | 19 Sep 25 21:49 UTC | HN request time: 0.704s | source

Show context

tadfisher ◴[19 Sep 25 23:43 UTC] No.45308140[source]▶

Is anyone working on the instruction/data-conflation problem? We're extremely premature in hooking up LLMs to real data sources and external functions if we can't keep them from following instructions in the data. Notion in particular shows absolutely zero warnings to end users, and encourages them to connect GitHub, GMail, Jira, etc. to the model. At this point it's basically criminal to treat this as a feature of a secure product.

replies(4): >>45308229 #>>45309698 #>>45310081 #>>45310871 #

1. mcapodici ◴[20 Sep 25 02:33 UTC] No.45309698[source]▶

>>45308140 #

The way you worded tbat is good and got me thinking.

What if instead of just lots of text fed to an LLM we have a data structure with trusted and untrusted data.

Any response on a call to a web search or MCP is considered untrusted by default (tunable if you also wrote the MCP and trust it).

The you limit tbe operations on untrusted data to pure transformations, no side effects.

E.g. run an LLM to summarize, or remove whitespace, convert to float etc. All these done in a sandbox without network access.

For example:

"Get me all public github issues on this repo, summarise and store in this DB."

Although the command reads public information untrusted and has DB access it will only process the untrusted information in a tight sandbox and so this can be done securely. I think!

replies(2): >>45311866 #>>45313574 #

2. sebastiennight ◴[20 Sep 25 09:48 UTC] No.45311866[source]▶

>>45309698 (TP) #

You definitely do not need or want to give database access to an LLM-with-scaffolding system to execute the example you provided.

(by database access, I'm assuming you'd be planning to ask the LLM to write SQL code which this system would run)

Instead, you would ask your LLM to create an object containing the structured data about those github issues (ID, title, description, timestamp, etc) and then you would run a separate `storeGitHubIssues()` method that uses prepared statements to avoid SQL injection.

replies(1): >>45312560 #

3. mcapodici ◴[20 Sep 25 11:58 UTC] No.45312560[source]▶

>>45311866 #

Yes this. What you said is what I meant.

You could also get the LLM to "vibe code" the SQL. Tbis is somewhat dangerous as the LLM might make mistakes, but the main thing I am talking about hete is how not to be "influenced" by text in data and so be susceptible to that sort of attack.

4. simonw ◴[20 Sep 25 14:10 UTC] No.45313574[source]▶

>>45309698 (TP) #

"Get me all public github issues on this repo, summarise and store in this DB."

Yes, this can be done safely.

If you think of it through the "lethal trifecta" framing, to stay safe from data stealing attacks you need to avoid having all three of exposure to untrusted content, exposure to private data and an exfiltration vector.

Here you're actually avoiding two out of them: - there's no private data (just public issue access) and no mechanism that can exfiltrate, so the worst a malicious instruction can do is cause incorrect data to rewritten to your database.

You have to be careful when designing that sandboxed database tool but that's not too hard too get right.

↑