What if instead of just lots of text fed to an LLM we have a data structure with trusted and untrusted data.
Any response on a call to a web search or MCP is considered untrusted by default (tunable if you also wrote the MCP and trust it).
The you limit tbe operations on untrusted data to pure transformations, no side effects.
E.g. run an LLM to summarize, or remove whitespace, convert to float etc. All these done in a sandbox without network access.
For example:
"Get me all public github issues on this repo, summarise and store in this DB."
Although the command reads public information untrusted and has DB access it will only process the untrusted information in a tight sandbox and so this can be done securely. I think!
(by database access, I'm assuming you'd be planning to ask the LLM to write SQL code which this system would run)
Instead, you would ask your LLM to create an object containing the structured data about those github issues (ID, title, description, timestamp, etc) and then you would run a separate `storeGitHubIssues()` method that uses prepared statements to avoid SQL injection.
You could also get the LLM to "vibe code" the SQL. Tbis is somewhat dangerous as the LLM might make mistakes, but the main thing I am talking about hete is how not to be "influenced" by text in data and so be susceptible to that sort of attack.
Yes, this can be done safely.
If you think of it through the "lethal trifecta" framing, to stay safe from data stealing attacks you need to avoid having all three of exposure to untrusted content, exposure to private data and an exfiltration vector.
Here you're actually avoiding two out of them: - there's no private data (just public issue access) and no mechanism that can exfiltrate, so the worst a malicious instruction can do is cause incorrect data to rewritten to your database.
You have to be careful when designing that sandboxed database tool but that's not too hard too get right.