Even MCP is still relying on tool/function calling in LLMs which is just a finetune/trained behaviour with zero guarantees.
It relies on training the model so that the probability of outputting the python_call token followed by the correct tool is relatively high. But it's still n<1.