(github.com)

134 points nick_wolf | 2 comments | 15 Apr 25 05:15 UTC | HN request time: 0.575s | source

I noticed the growing security concerns around MCP (https://news.ycombinator.com/item?id=43600192) and built an open source tool that can detect several patterns of tool poisoning attacks, exfiltration channels and cross-origin manipulations.

MCP-Shield scans your installed servers (Cursor, Claude Desktop, etc.) and shows what each tool is trying to do at the instruction level, beyond just the API surface. It catches hidden instructions that try to read sensitive files, shadow other tools' behavior, or exfiltrate data.

Example of what it detects:

- Hidden instructions attempting to access ~/.ssh/id_rsa

- Cross-origin manipulations between server that can redirect WhatsApp messages

- Tool shadowing that overrides behavior of other MCP tools

- Potential exfiltration channels through optional parameters

I've included clear examples of detection outputs in the README and multiple example vulnerabilities in the repo so you can see the kinds of things it catches.

This is an early version, but I'd appreciate feedback from the community, especially around detection patterns and false positives.

1. mirkodrummer ◴[15 Apr 25 07:14 UTC] No.43689875[source]▶

>>43689178 (OP) #

So the analysis is done with another call to claude with instructions like "You are a cybersecurity expert..." basically another level of extreme indirection with unpredictable results, and maybe vulnerable to injection itself

replies(1): >>43690101 #

2. nick_wolf ◴[15 Apr 25 07:51 UTC] No.43690101[source]▶

>>43689875 (TP) #

It's definitely a weird loop, relying on another LLM call to analyze potential issues in stuff meant for an LLM. And you're right, it's not perfectly predictable – you might get slightly different feedback run-to-run until careful prompt engineering, that's just the nature of current models. That's why the pattern-matching checks run firs, they're the deterministic baseline. The Claude analysis adds a layer that's inherently fuzzier, trying to catch subtler semantic tricks or things the patterns miss.

And yeah, the analysis prompt itself – could someone craft a tool description that injects that prompt when it gets sent to Claude? Probably. It's turtles all the way down, sometimes. That meta-level injection is a whole other can of worms with these systems. It's part of why that analysis piece is optional and needs the explicit API key. Definitely adds another layer to worry about, for sure.

↑

Show HN: MCP-Shield – Detect security issues in MCP servers