For the answer of is "is softmax the only way to turn unnormalized real values into a categorial distribution" you can just use statistics.
Eg. Using Bayesian stats, if i assume an even prior (pretend i have no assumptions about how biased it is), i see a coin flip heads 4 times in a row, what's the probability of it being heads?
Via a long winded proof using the dirichlet distribution Bayesian stats will say "add one to the top and two to the bottom". Here we saw 4/4 heads. So we guess 5/6 chance of being heads (+1 to the top, +2 to the bottom) the next time or a 1/6 chance of being tails. This represents that the statistical model is assuming some bias in the coin.
That's normalized as a probability against 1 which is what we want. It works for multiple probabilities as well, you add to the bottom as many different outcomes as you have. The Dirichlet distribution allows for real numbers and you can support this too. If you feel this gives too much weight to the possibility of the coin being biased you can actually simply add more to the top and bottom which is the same as accounting for this in your prior, eg. add 100 to the top and 200 to the bottom instead.
Now this has a lot of differences with outcomes compared to softmax. It actually gives everything a non-zero chance rather than using the classic sigmoid activation function that softmax has underneath which moves things to almost absolute 0 or 1. But... other distributions like this are very helpful in many circumstances. Do you actually think the chance of tails becomes 0 if you see heads flipped 100 times in a row? Of course not.
So anyway the softmax function fits things to a particular type of distribution but you can actually fit pretty much anything to any distribution with good old statistics. Choose the right one for your use case.