Softmax forever, or why I like softmax

(kyunghyuncho.me)

184 points jxmorris12 | 1 comments | 16 Feb 25 07:08 UTC | HN request time: 0.207s | source

Show context

nobodywillobsrv ◴[20 Feb 25 07:12 UTC] No.43111951[source]▶

Softmax’s exponential comes from counting occupation states. Maximize the ways to arrange things with logits as energies, and you get exp(logits) over a partition function, pure Boltzmann style. It’s optimal because it’s how probability naturally piles up.

replies(2): >>43111971 #>>43113945 #

semiinfinitely ◴[20 Feb 25 07:16 UTC] No.43111971[source]▶

>>43111951 #

right and it should be totally obvious that we would choose an energy function from statistical mechanics to train our hotdog-or-not classifier

replies(3): >>43112080 #>>43112333 #>>43112585 #

1. xelxebar ◴[20 Feb 25 08:23 UTC] No.43112333[source]▶

>>43111971 #

The connection isn't immediately obvious, but it's simply because solving for the maximum entry distribution that achieves a given expectation value produces the Botlzmann distribution. In stat mech, our "classifier" over (micro-)states is energy; in A.I. the classifier is labels.

For details, the keyword is Lagrange multiplier [0]. The specific application here is maximizing f as the entropy with the constraint g the expectation value.

If you're like me at all, the above will be a nice short rabbit hole to go down!

[0]:https://tutorial.math.lamar.edu/classes/calciii/lagrangemult...

↑