(kyunghyuncho.me)

184 points jxmorris12 | 1 comments | 16 Feb 25 07:08 UTC | HN request time: 0.208s | source

Show context

nobodywillobsrv ◴[20 Feb 25 07:12 UTC] No.43111951[source]▶

Softmax’s exponential comes from counting occupation states. Maximize the ways to arrange things with logits as energies, and you get exp(logits) over a partition function, pure Boltzmann style. It’s optimal because it’s how probability naturally piles up.

replies(2): >>43111971 #>>43113945 #

semiinfinitely ◴[20 Feb 25 07:16 UTC] No.43111971[source]▶

>>43111951 #

right and it should be totally obvious that we would choose an energy function from statistical mechanics to train our hotdog-or-not classifier

replies(3): >>43112080 #>>43112333 #>>43112585 #

1. Y_Y ◴[20 Feb 25 09:00 UTC] No.43112585[source]▶

>>43111971 #

The way that energy comes in is that you have a fixed (conserved) amount of it and you have to portion it out among your states. There's nothing inherently energy-related about, it just happens that we often want to look energy distributions and lots of physical systems distribute energy this way (because it's the energy distribution with maximal entropy given the constraints).

(After I wrote this I saw the sibling comment from xelxebar which is a better way of saying the same thing.)

↑

Softmax forever, or why I like softmax