Consider a python function signature
list_containers(show_stopped: bool = False, name_pattern: Optional[str] = None, sort: Literal["size", "name", "started_at"] = "name"). It doesn't even need docs
Now convert this to JSON schema which is 4x larger input already.
And when generating output, the LLM will generate almost 2x more tokens too, because JSON. Easier to get confused.
And consider that the flow of calling python functions and using their output to call other tools etc... is seen 1000x more times in their fine tuning data, whereas JSON tool calling flows are rare and practically only exist in instruction tuning phase. Then I am sure instruction tuning also contains even more complex code examples where model has to execute complex logic.
Then theres the whole issue of composition. To my knowledge there's no way LLM can do this in one response.
vehicle = call_func_1()
if vehicle.type == "car":
details = lookup_car(vehicle.reg_no)
else if vehicle.type == "motorcycle":
details = lookup_motorcycle(vehicle.reg_ni)
How is JSON tool calling going to solve this?