←back to thread

181 points jxmorris12 | 1 comments | | HN request time: 0.208s | source
Show context
nobodywillobsrv ◴[] No.43111951[source]
Softmax’s exponential comes from counting occupation states. Maximize the ways to arrange things with logits as energies, and you get exp(logits) over a partition function, pure Boltzmann style. It’s optimal because it’s how probability naturally piles up.
replies(2): >>43111971 #>>43113945 #
semiinfinitely ◴[] No.43111971[source]
right and it should be totally obvious that we would choose an energy function from statistical mechanics to train our hotdog-or-not classifier
replies(3): >>43112080 #>>43112333 #>>43112585 #
1. Y_Y ◴[] No.43112585[source]
The way that energy comes in is that you have a fixed (conserved) amount of it and you have to portion it out among your states. There's nothing inherently energy-related about, it just happens that we often want to look energy distributions and lots of physical systems distribute energy this way (because it's the energy distribution with maximal entropy given the constraints).

(After I wrote this I saw the sibling comment from xelxebar which is a better way of saying the same thing.)