Most active commenters
  • whoknowsidont(8)
  • paulddraper(3)

←back to thread

237 points jdkee | 20 comments | | HN request time: 1.275s | source | bottom
Show context
whoknowsidont ◴[] No.45948637[source]
MCP was a really shitty attempt at building a plugin framework that was vague enough to lure people into and then allow other companies to build plugin platforms to take care of the MCP non-sense.

"What is MCP, what does it bring to the table? Who knows. What does it do? The LLM stuff! Pay us $10 a month thanks!"

LLM's have function / tool calling built into them. No major models have any direct knowledge of MCP.

Not only do you not need MCP, but you should actively avoid using it.

Stick with tried and proven API standards that are actually observable and secure and let your models/agents directly interact with those API endpoints.

replies(8): >>45948748 #>>45949815 #>>45950303 #>>45950716 #>>45950817 #>>45951274 #>>45951510 #>>45951951 #
1. paulddraper ◴[] No.45950303[source]
> No major models have any direct knowledge of MCP.

Claude and ChatGPT both support MCP, as does the OpenAI Agents SDK.

(If you mean the LLM itself, it is "known" at least as much as any other protocol. For whatever that means.)

replies(1): >>45950488 #
2. whoknowsidont ◴[] No.45950488[source]
>it is "known" at least as much as any other protocol.

No. It is not. Please understand what the LLM's are doing. Claude nor ChatGPT nor any major model knows what MCP is.

They know how to function & tool call. They have zero trained data on MCP.

That is a factual statement, not an opinion.

replies(6): >>45950540 #>>45950541 #>>45950569 #>>45950763 #>>45950803 #>>45951338 #
3. choilive ◴[] No.45950540[source]
That is an easily falsifiable statement. If I ask ChatGPT or Claude what MCP is Model Context Protocol comes up, and furthermore it can clearly explain what MCP does. That seems unlikely to be a coincidental hallucination.
replies(2): >>45950578 #>>45957524 #
4. Bockit ◴[] No.45950541[source]
This is probably a semantics problem. You’re right. The models don’t know how to mcp. The harness they run in does though (Claude code, Claude desktop, etc), and dynamically exposes mcp tools as tool calls.
replies(2): >>45950559 #>>45950581 #
5. llbbdd ◴[] No.45950559{3}[source]
HN loves inventing semantics problems around AI. It's gotten really, really annoying and I'm not sure the people doing it are even close to understanding it.
6. numpad0 ◴[] No.45950569[source]
(pedantry)it's something humans are talking about a lot, so up-to-date models do know about it...
replies(1): >>45950600 #
7. whoknowsidont ◴[] No.45950578{3}[source]
Training data =/= web search

Both ChatGPT and Claude will perform web searches when you ask them a question, which the fact that you got this confused is ironically topical.

But you're still misunderstanding the principle point because at some point these models will undoubtedly have access to that data and be trained on it.

But they didn't need to be, because LLM function & tool calling is already trained on these models and MCP does not augment this functionality in any way.

replies(2): >>45950678 #>>46003517 #
8. whoknowsidont ◴[] No.45950581{3}[source]
>dynamically exposes mcp tools as tool calls.

It doesn't even do that. It's not magic.

9. whoknowsidont ◴[] No.45950600{3}[source]
Most likely! It's hard to qualify which specific models and version I'm talking about because they're constantly being updated.

But the point is that function & tool calling was already built in. If you take a model from before "MCP" was even referenced on the web it will still _PERFECTLY_ interact with not only other MCP servers and clients but any other API as well.

10. davidcbc ◴[] No.45950678{4}[source]
Claude gives me a lengthy explanation of MCP with web search disabled
replies(1): >>45950731 #
11. whoknowsidont ◴[] No.45950731{5}[source]
Great! It's still irrelevant.
12. cookiengineer ◴[] No.45950763[source]
> That is a factual statement,

I think most people, even most devs, don't actually know how crappy an MCP client is built, and that it's essentially an MITM approach and that the client sends the LLM on the other end a crappy pretext of what tools are mounted and how to call their methods in a JSON, and then tries to intelligently guess what response was a tool call.

And that intelligent guess is where it gets interesting for pentesting, because you cannot guess anything failsafe.

13. paulddraper ◴[] No.45950803[source]
> They have zero trained data on MCP.

They have significant data trained on MCP.

> They know how to function & tool call.

Right. You can either use MCP to transmit those tool calls, or you can create some other interface.

replies(1): >>45950854 #
14. whoknowsidont ◴[] No.45950854{3}[source]
>They have significant data trained on MCP.

No they don't lol.

replies(1): >>45954410 #
15. ◴[] No.45951338[source]
16. paulddraper ◴[] No.45954410{4}[source]
Wild claim.

MCP has been popular for well over a year.

To filter it out of the training data would be laughable.

replies(2): >>45955218 #>>45958411 #
17. whoknowsidont ◴[] No.45955218{5}[source]
Please give this a read before engaging further: https://huggingface.co/docs/hugs/en/guides/function-calling

You're just utilizing your ignorance to yap at this point.

18. cstrahan ◴[] No.45957524{3}[source]
You're misinterpreting OP.

OP is saying that the models have not been trained on particular MCP use, which is why MCP servers serve up tool descriptions, which are fed to the LLM just like any other text -- that is, these descriptions consume tokens and take up precious context.

Here's a representative example, taken from a real world need I had a week ago. I want to port a code base from one language to another (ReasonML to TypeScript, for various reasons). I figure the best way to go about this would be to topologically sort the files by their dependencies, so I can start with porting files with absolutely zero imports, then port files where the only dependencies are on files I've already ported, and so on. Let's suppose I want to use Claude Code to help with this, just to make the choice of agent concrete.

How should I go about this?

The overhead of the MCP approach would be analogous to trying to cram all of the relevant files into the context, and asking Claude to sort them. Even if the context window is sufficient, that doesn't matter because I don't want Claude to "try its best" to give me the topological sort straight from its nondeterministic LLM "head".

So what did I do?

I gave it enough information about how to consult build metadata files to derive the dependency graph, and then had it write a Python script. The LLM is already trained on a massive corpus of Python code, so there's no need to spoon feed it "here's such and such standard library function", or "here's the basic Python syntax", etc -- it already "knows" that. No MCP tool descriptions required.

And then Claude code spits out a script that, yes, I could have written myself, but it does it in maybe 1 minute total of my time. I can skim the script and make sure that it does exactly what it should be doing. Given that this is code, and not nondeterministic wishy washy LLM "reasoning", I know that the result is both deterministic and correct. The total token usage is tiny.

If you look at what Anthropic and CloudFlare have to say on the matter (see https://www.anthropic.com/engineering/code-execution-with-mc... and https://blog.cloudflare.com/code-mode/), it's basically what I've described, but without explicitly telling the LLM to write a script / reviewing that script.

If you have the LLM write code to interface with the world, it can leverage its training in that code, and the code itself will do what code does (precisely what it was configured to do), and the only tokens consumed will be the final result.

MCP is incredibly wasteful and provides more opportunities for LLMs to make mistakes and/or get confused.

19. cstrahan ◴[] No.45958411{5}[source]
What whoknowsidont is trying to say (IIUC): the models aren't trained on particular MCP use. Yes, the models "know" what MCP is. But the point is that they don't necessarily have MCP details baked in -- if they did, there would be no point in having MCP support serving prompts / tool descriptions.

Well, arguably descriptions could be beneficial for interfaces that let you interactively test MCP tools, but that's certainly not the main reason. The main reason is that the models need to be informed about what the MCP server provides, and how to use it (where "how to use it" in this context means "what is the schema and intent behind the specific inputs/outputs" -- tool calls are baked into the training, and the OpenAI docs give a good example: https://platform.openai.com/docs/guides/function-calling).

20. judahmeek ◴[] No.46003517{4}[source]
> But they didn't need to be, because LLM function & tool calling is already trained on these models and MCP does not augment this functionality in any way.

I think you're making a weird semantic argument. How is MCP use not a tool call?