←back to thread

247 points nabla9 | 1 comments | | HN request time: 0.255s | source
Show context
derbOac ◴[] No.41833029[source]
I love this stuff because it's so counterintuitive until you've worked through some of it. There was an article linked to on HN a while back about high-dimensional Gaussian distributions that was similar in message, and probably mathematically related at some level. It has so many implications for much of the work in deep learning and large data, among other things.
replies(1): >>41837498 #
1. vqv ◴[] No.41837498[source]
Statistician here.

I agree that some of this stuff seems counterintuitive on the surface. Once you make the connection with high-dimensional Gaussians, it can become more "obvious": if Z is standard n-dimensional Gaussian random vector, i.e. one with iid N(0,1) coordinates, then normalizing Z by its norm, say W, gives a random vector U that is uniformly distributed on an n-Sphere. Moreover, U is independent of W --- this is related to the fact that the sample mean and variance are independent for a random sample from a Normal population --- and W^2 has Chi-squared distribution on n degrees of freedom. So for example a statement about concentration of volume of the n-Sphere about an equatorial slice is equivalent to a statement about the probability that the dot product between U and a fixed unit norm vector is close to 0, and that probability is easy to approximate using undergraduate-level probability theory.

Circling back to data: it is very easy to be mislead when working with high-dimensional data, i.e. data with many, many features.