←back to thread

302 points sebg | 1 comments | | HN request time: 0.2s | source
1. bob1029 ◴[] No.45051528[source]
MSE remains my favorite distance measure by a long shot. Its quadratic nature still helps even in non-linear problem spaces where convexity is no longer guaranteed. When working with generic/raw binary data where hamming distance would be theoretically more ideal, I still prefer MSE over byte-level values because of this property.

Other fitness measures take much longer to converge or are very unreliable in the way in which they bootstrap. MSE can start from a dead cold nothing on threading the needle through 20 hidden layers and still give you a workable gradient in a short period of time.