If you solve chess then you have a tree that is too large for us to currently compute (about 10^80 although my memory may be way off). Annotating that tree with win / loss / draw would allow an optimal player without search. The two obvious approaches to compression / optimization are to approximate the tree, and to approximate the annotations. How well those two approaches would work depends a lot on the structure of the tree.
This result seems to tell us less about the power of the training approach (in absolute terms) and more about how amenable the chess game tree is to those two approaches (in relative terms). What I would take away is that a reasonable approximation of that tree can be made in 270M words of data.
replies(1):