You're right that per-request LLM inference is costly, especially at scale. But I think the bet with MCP (Model Coordinated Protocol) isn’t that every interaction will always hit the LLM, it’s that the LLM bootstraps flexible protocol negotiation, and then clients can cache or formalize the result.
Think of it more like a just-in-time protocol generator. The LLM defines how to talk, once, and the rest of the interaction can be optimized. Expensive at first, but potentially cheaper than baking in rigid APIs for every use case.
Still, lots of unknowns, especially around latency, consistency, and cost.