MCP is eating the world

(www.stainless.com)

Show context

0x500x79 ◴[24 Jun 25 15:52 UTC] No.44367530[source]▶

I believe that MCP is a bit over-marketed.

MCP allows you to bring tools to agents you don't control. It's awesome, but it isn't the right match for every problem. If you believe the hype of X/LinkedIn you would think that MCP everywhere is going to be the solution.

Bringing tools to your local Claude client is awesome, but there are still challenges with MCP that need to be solved and like all technology, it isn't applicable universally.

Not to mention it's a recipe for burning tokens!

replies(6): >>44367740 #>>44367752 #>>44367901 #>>44367968 #>>44368623 #>>44371990 #

1. theOGognf ◴[24 Jun 25 16:12 UTC] No.44367752[source]▶

>>44367530 #

Along with burning tokens, how MCP servers are ran and managed is resource wasteful. Running a whole Docker container just to have some model call a single API? Want to call a small CLI utility, people say to run another Docker container for that

Feels like a monolith would be better

replies(3): >>44368026 #>>44368074 #>>44368296 #

2. MaxLeiter ◴[24 Jun 25 16:36 UTC] No.44368026[source]▶

>>44367752 (TP) #

Remote MCPs should resolve some of this

3. MyOutfitIsVague ◴[24 Jun 25 16:40 UTC] No.44368074[source]▶

>>44367752 (TP) #

A "whole Docker container" is not very heavyweight. Other than having their own filesystem view and separate shared libraries, container processes are nearly as light as non-container processes. It's not like running a VM.

replies(2): >>44368289 #>>44381725 #

4. jcelerier ◴[24 Jun 25 16:58 UTC] No.44368289[source]▶

>>44368074 #

> Other than having their own filesystem view and separate shared libraries, container processes are nearly as light as non-container processes. It's not like running a VM.

why does the smallest script take seconds to even start then?

replies(2): >>44368313 #>>44371146 #

5. stingraycharles ◴[24 Jun 25 16:59 UTC] No.44368296[source]▶

>>44367752 (TP) #

I don’t think running these commands in a docker container is the standard way of doing this, I’ve seen “npx” et al being used way more often.

Furthermore, the “docker” part wouldn’t even be the most resource wasteful if you consider the general computational costs of LLMs.

The selling point of MCP servers is that they are composable and plug in into any AI agent. A monolith doesn’t achieve that, unless I’m misunderstanding things.

What I find annoying is that it’s very unpredictable when exactly an LLM will actually invoke an MCP tool function. Different LLM providers’ models behave differently, and even within the same provider different models behave differently.

Eg it’s surprisingly difficult to get an AI agent to actually use a language server to retrieve relevant information about source code, and it’s even more difficult to figure out a prompt for all language server functions that works reliably across all models.

And I guess that’s because of the fuzzy nature of it all.

I’m waiting to see how this all matures, I have the highest expectations of Anthropic with this. OpenAI seems to be doing their own thing (although ChatGPT supposedly will come with MCP support soon). Google’s models appear to be the most eager to actually invoke MCP functions, but they invoke them way too much, in turn causing a lot of context to get wasted / token noise.

6. stingraycharles ◴[24 Jun 25 17:00 UTC] No.44368313{3}[source]▶

>>44368289 #

That is not normal. Small scripts should launch in milliseconds, not several seconds.

7. antonvs ◴[24 Jun 25 21:15 UTC] No.44371146{3}[source]▶

>>44368289 #

Below is a transcript of a "smallest script" which runs in 328 ms total on my machine. And that includes loading an ubuntu OS image, which could be optimized depending on what dependencies the script needs.

Of course, if you're invoking it on a remote cluster, there are many reasons it can talk longer, especially if the cluster has to scale to satisfy the request. But all those reasons are nothing to do with containers specifically - it's things like cluster capacity, node provisioning time, container pull time, network latency, etc. If you architect things properly, you can get the number below + network latency even for a remote cluster invocation.

    $ time docker run ubuntu echo hello world  
    hello world  

    real    0m0.328s  
    user    0m0.011s  
    sys     0m0.010s

8. Nextgrid ◴[25 Jun 25 20:54 UTC] No.44381725[source]▶

>>44368074 #

It really depends whether the parent is on Linux (where your description would be correct), or on MacOS, where running a container involves first booting up a Linux VM under the hood and having it exchange files/network traffic with the host with tons of overhead (not to mention the pile of shit Electron-based Docker Desktop app that feels like it takes as much memory as the Linux VM itself).

↑