(github.com)

134 points nick_wolf | 2 comments | 15 Apr 25 05:15 UTC | HN request time: 0.412s | source

I noticed the growing security concerns around MCP (https://news.ycombinator.com/item?id=43600192) and built an open source tool that can detect several patterns of tool poisoning attacks, exfiltration channels and cross-origin manipulations.

MCP-Shield scans your installed servers (Cursor, Claude Desktop, etc.) and shows what each tool is trying to do at the instruction level, beyond just the API surface. It catches hidden instructions that try to read sensitive files, shadow other tools' behavior, or exfiltrate data.

Example of what it detects:

- Hidden instructions attempting to access ~/.ssh/id_rsa

- Cross-origin manipulations between server that can redirect WhatsApp messages

- Tool shadowing that overrides behavior of other MCP tools

- Potential exfiltration channels through optional parameters

I've included clear examples of detection outputs in the README and multiple example vulnerabilities in the repo so you can see the kinds of things it catches.

This is an early version, but I'd appreciate feedback from the community, especially around detection patterns and false positives.

1. khafra ◴[15 Apr 25 05:53 UTC] No.43689390[source]▶

>>43689178 (OP) #

Nice! This is a much-needed space for security tooling, and I appreciate that you've put some thought into the new attack vectors. I also like the combination of signature-based analysis, and having an LLM do its own deep dive.

I expect a lot of people to refine the tool as they use it; one big challenge in maintaining the project is going to be incorporating pull requests that improve the prompt in different directions.

replies(1): >>43689610 #

2. nick_wolf ◴[15 Apr 25 06:29 UTC] No.43689610[source]▶

>>43689390 (TP) #

Thanks for the kind words – really appreciate you taking the time to look it over and get what we're trying to do here.

Yeah, combining the regex/pattern checks with having Claude take a look felt like the right balance... catch the low-hanging fruit quickly but also get a deeper dive for the trickier stuff. Glad that resonates.

Maintaining the core prompt quality as people contribute improvements... that's going to be interesting. Keeping it effective and preventing it from becoming a kitchen sink of conflicting instructions will be key. Definitely something we'll need to figure out as we go.

↑

Show HN: MCP-Shield – Detect security issues in MCP servers