The initial LLM acts as an intention detection mechanism switch.
To personify LLM way too much:
It sees that a prompt of some kind wants to play chess.
Knowing this it looks at the bag of “tools” and sees a chess tool.
It then generates a response which eventually causes a call to a chess AI (or just chess program, potentially) which does further processing.
The first LLM acts as a ton of if-then statements, but automatically generated (or brute-forcly discovered) through training.
You still needed discrete parts for this system. Some communication protocol, an intent detection step, a chess execution step, etc…
I don’t see how that differs from a classic expert system other than the if statement is handled by a statistical model.