For details, the keyword is Lagrange multiplier [0]. The specific application here is maximizing f as the entropy with the constraint g the expectation value.
If you're like me at all, the above will be a nice short rabbit hole to go down!
[0]:https://tutorial.math.lamar.edu/classes/calciii/lagrangemult...
(After I wrote this I saw the sibling comment from xelxebar which is a better way of saying the same thing.)
But in machine learning, it has no significance at all. In particular, to fix the average weight, you need to vary the temperature depending on the individual weights, but machine learning practicioners typically fix the temperature instead, so that the average weight varies wildly.
So softmax weights (logits) are just one particular way to parameterize a categorical distribution, and there's nothing precluding another parameterization from working just as well or better.