←back to thread

181 points jxmorris12 | 1 comments | | HN request time: 0.403s | source
1. janalsncm ◴[] No.43112900[source]
This is a really intuitive explanation, thanks for posting. I think everyone’s first intuition for “how do we turn these logits into probabilities” is to use a weighted sum of the absolute values of the numbers. The unjustified complexity of softmax annoyed me in college.

The author gives a really clean explanation for why that’s hard for a network to learn, starting from first principles.