←back to thread

152 points lapnect | 1 comments | | HN request time: 0.329s | source
Show context
mjhay ◴[] No.41915325[source]
Great article, but I wish it would have made a more explicit mention of the* central limit theorem (CLT), which I think is what makes the normal distribution "normal." For those not familiar, here is the jist: suppose you have `n` independent, finite-variance random variables with support in the real numbers (so things like count R.V.s work). Asymptotically, as n->infinity, the distribution of the mean will approach a normal distribution. Usually, n doesn't have to be big for this to be a reasonable approximation. n~30 is often fine. The CLT extends in a

To me, this is one of the most astonishing things about probability theory, as well as one of the most useful.

The normal distribution is just one of a class of "stable distributions," all sharing the properties of sums of their R.V.s being in the same family.

The same idea can be generalized much further. The underlying idea is the distribution of "things" as they get asymptotically "bigger." The density of eigenvalues of random matrices with I.I.D entries approach the Wigner Semicircle Distribution, which is exactly what it sounds like. It plays the role of the normal distribution in the very practically-promising theory of free (noncommutative) probability.

https://en.wikipedia.org/wiki/Wigner_semicircle_distribution

Further reading:

https://terrytao.wordpress.com/2010/01/05/254a-notes-2-the-c...

*there's a few normal distribution CLTs, but this is the intuitive one that usually matters in practice

replies(2): >>41916797 #>>41917923 #
mturmon ◴[] No.41917923[source]
> ...most astonishing things about probability theory...

It's a core result, perhaps the most useful core result of standard probability theory.

But from some points of view, the CLT is not actually astonishing.

If you know what Terry Tao (in the convenient link above) calls the "Fourier-analytic proof", the CLT for IID variables can seem inevitable, as long as the underlying distribution is such that the moment generating function (density Fourier transform) of the first summand exists.

I'd be interested to hear if you have sympathy with the following reasoning:

The Gaussian distribution corresponds to a MGF with second-order behavior like 1 - t^2/2 around the origin. You only care about MGF behavior around the origin because, as N -> \infty, that's all that matters.

Because of the way we normalized the sum (we subtracted the mean), the first-order term in the MGF will vanish. We purposely zeroed it out by centering the sum around zero. That leaves the second-order term, which will give a Gaussian distribution.

So in short:

    - MGF of one summand exists => MGF of (recentered) sum exists
    - We have an expression for the MGF of the recentered sum (convolution property)
    - Only the MGF behavior around the origin matters
    - We re-center the sum, causing the first-order term to vanish
    - We invert the resulting MGF and recover the Gaussian
I'm not being precise here, but I hope the idea comes through.
replies(1): >>42022259 #
1. mjhay ◴[] No.42022259[source]
Hi, I didn't see this reply before, but I think that's a wonderfully simple way of looking at it. Thanks for the intuition and step-by-step construction.

That makes me think of the normal distribution and the heat kernel, which I'd be very interested to hear your thoughts on. The heat kernel is the Green's function solution of the heat equation (governing heat or other diffusive transport in the absence of material movement transporting the quantity along with it):

dT/dt = dT/dx^2 (pretend those are partial derivatives)

If we have an initial condition of T=1 at the origin and zero everywhere else (e.g., a spiked Dirac delta), the solution at time t>0 is the same as a normal distribution after appropriate normalization. The variance simply gets spatially bigger as time goes on and the thermal energy continues to diffuse.

Thinking about your intuition, the first derivative at the origin is zero as well (because the heat should diffuse the same in every direction absent any conductivity anisotropy). The second derivative is also near zero around the origin, for sufficiently small distances and sufficiently large T>0.

Because the heat kernel is flat to second (not first) order near the origin, heat flux vanishes to second order. At sufficiently large distances/small times, the flux vanishes to any order. The sweet spot is in the middle, at the steep part of the heat kernel/Gaussian. There, difference of heat and exiting a point is infinitesimally different, to the first order of the temperature gradient. But that doesn't mean that heat transport in an out of a point is proportional to the temperature gradient! One point can only transport heat to infinitesimally nearby points. The difference in temperature between infinitesimally close points, where the temperature gradient is infinitesimal, is doubly infinitesimal.