Tools: Code Is All You Need

1. never_inline ◴[03 Jul 25 17:11 UTC] No.44457168[source]▶

The problem I see with MCP is very simple. It's using JSON as the format and that's nowhere as expressive as a programming language.

Consider a python function signature

list_containers(show_stopped: bool = False, name_pattern: Optional[str] = None, sort: Literal["size", "name", "started_at"] = "name"). It doesn't even need docs

Now convert this to JSON schema which is 4x larger input already.

And when generating output, the LLM will generate almost 2x more tokens too, because JSON. Easier to get confused.

And consider that the flow of calling python functions and using their output to call other tools etc... is seen 1000x more times in their fine tuning data, whereas JSON tool calling flows are rare and practically only exist in instruction tuning phase. Then I am sure instruction tuning also contains even more complex code examples where model has to execute complex logic.

Then theres the whole issue of composition. To my knowledge there's no way LLM can do this in one response.

    vehicle = call_func_1()
    if vehicle.type == "car":
      details = lookup_car(vehicle.reg_no)
    else if vehicle.type == "motorcycle":
      details = lookup_motorcycle(vehicle.reg_ni)

How is JSON tool calling going to solve this?

replies(2): >>44457239 #>>44457256 #

2. chrisweekly ◴[03 Jul 25 17:18 UTC] No.44457239[source]▶

>>44457168 (TP) #

Great point.

But "the" problem with MCP? IMVHO (Very humble, non-expert) the half-baked or missing security aspects are more fundamental. I'd love to hear updates about that from ppl who know what they're talking about.

3. 8note ◴[03 Jul 25 17:20 UTC] No.44457256[source]▶

>>44457168 (TP) #

the reason to use the llm is that you dont know ahead of time that the vehicle type is only a car or motorcycle, and the llm will also figure out a way to detail bycicles and boats and airplanes, and to consider both left and right shoes separately.

the llm cant just be given this function because its specialized to just the two options.

you could have it do a feedback loop of rewriting the python script after running it, but whats the savings at tha point? youre wasting tokens talking about cars in python when you already know is a ski, and the llm could ask directly for the ski details without writing a script to do it in between