Think -- why has nobody been able to make an LLM play Go better than AlphaZero while still retaining language capabilities? It certainly would have orders of magnitude more parameters.
Think -- why has nobody been able to make an LLM play Go better than AlphaZero while still retaining language capabilities? It certainly would have orders of magnitude more parameters.
Search is a great tool for AI, but you have to have a reasonable search space (like chess or go or poker). I'm not close enough to Lean to understand if it can do something like that (i believe not), but math doesn't feel like a thing where you can reasonably "search" next steps, for most problems.
Naively, it would seem like transformers could line up nicely with turn-based games. Instead of mapping tokens to language as in an LLM, they could map to valid moves given the current game state. And then instead of optimizing the next token for linguistic coherence as LLMs do, you optimize for winning the game.
However, such reward system is not available for LLM in an open domain setting.