Isn't generating the training data by running stockfish on all the board positions for all the games just encoding the search tree into the transformer model?
So increasing the number of parameters to the model would allow it to encode more of the search tree and give better performance, which doesn't seem all that interesting.
replies(1):