Important machine learning equations

MSE remains my favorite distance measure by a long shot. Its quadratic nature still helps even in non-linear problem spaces where convexity is no longer guaranteed. When working with generic/raw binary data where hamming distance would be theoretically more ideal, I still prefer MSE over byte-level values because of this property.

Other fitness measures take much longer to converge or are very unreliable in the way in which they bootstrap. MSE can start from a dead cold nothing on threading the needle through 20 hidden layers and still give you a workable gradient in a short period of time.