←back to thread

134 points nick_wolf | 1 comments | | HN request time: 0.29s | source

I noticed the growing security concerns around MCP (https://news.ycombinator.com/item?id=43600192) and built an open source tool that can detect several patterns of tool poisoning attacks, exfiltration channels and cross-origin manipulations.

MCP-Shield scans your installed servers (Cursor, Claude Desktop, etc.) and shows what each tool is trying to do at the instruction level, beyond just the API surface. It catches hidden instructions that try to read sensitive files, shadow other tools' behavior, or exfiltrate data.

Example of what it detects:

- Hidden instructions attempting to access ~/.ssh/id_rsa

- Cross-origin manipulations between server that can redirect WhatsApp messages

- Tool shadowing that overrides behavior of other MCP tools

- Potential exfiltration channels through optional parameters

I've included clear examples of detection outputs in the README and multiple example vulnerabilities in the repo so you can see the kinds of things it catches.

This is an early version, but I'd appreciate feedback from the community, especially around detection patterns and false positives.

Show context
pcwelder ◴[] No.43689398[source]
Cool.

If I'm not wrong you don't detect prompt injection done in the tool results? Any plans for that?

replies(1): >>43689600 #
1. nick_wolf ◴[] No.43689600[source]
Hmm, yeah, that's a fair point. You're right, we're looking at the tool definitions – the descriptions, schemas, etc. – not the stuff that comes back after a tool runs.

It's tricky, because actually running the tools... that's where things get hairy. We'd have to invoke potentially untrusted code during a scan, figure out how to generate valid inputs for who-knows-what schemas, and deal with whatever side effects happen.

So, honestly, no solid plans for that right now. The focus is squarely on the static analysis side – what the server claims it can do. Trying to catch vulnerabilities in those definitions feels like the right scope for this particular tool.

I think that analyzing the actual results is more about a runtime concern. Like, something the client needs to be responsible for when it gets the data back, or maybe a different kind of monitoring tool altogether. Still feels like an open question where that kind of check really fits best. It's definitely a gap, though. Something to chew on.