←back to thread

181 points jxmorris12 | 1 comments | | HN request time: 0.217s | source
Show context
nobodywillobsrv ◴[] No.43111951[source]
Softmax’s exponential comes from counting occupation states. Maximize the ways to arrange things with logits as energies, and you get exp(logits) over a partition function, pure Boltzmann style. It’s optimal because it’s how probability naturally piles up.
replies(2): >>43111971 #>>43113945 #
semiinfinitely ◴[] No.43111971[source]
right and it should be totally obvious that we would choose an energy function from statistical mechanics to train our hotdog-or-not classifier
replies(3): >>43112080 #>>43112333 #>>43112585 #
1. xelxebar ◴[] No.43112333[source]
The connection isn't immediately obvious, but it's simply because solving for the maximum entry distribution that achieves a given expectation value produces the Botlzmann distribution. In stat mech, our "classifier" over (micro-)states is energy; in A.I. the classifier is labels.

For details, the keyword is Lagrange multiplier [0]. The specific application here is maximizing f as the entropy with the constraint g the expectation value.

If you're like me at all, the above will be a nice short rabbit hole to go down!

[0]:https://tutorial.math.lamar.edu/classes/calciii/lagrangemult...