I am curious why the author chose a genetic algorithm rather than standard backprop to distill the evil. Logistic regression seems like a pretty reasonable choice and it’ll be a lot faster than a genetic algorithm. Add an L1 penalty for sparsity.
In the past I’ve tried distilling not just the eval function but the outcome of the search (something like depth 20) in a neural net. It kind of works, but not very well until the net is pretty big. Deepmind did something similar.
replies(1):