Edit: corrected my sentence, but see 0xdde reply for better info.
[1] https://www.microsoft.com/en-us/research/publication/pattern...
Giving a cursory look into Bishop's book I see that I am wrong, as there's deep root in Bayesian Inference as well.
On another note, I find it very interesting that there's not a bigger emphasis on using the correct distributions in ML models, as the methods are much more concerned in optimizing objective functions.
See Breiman's classic "Two Cultures" paper that this post's title is referencing: https://projecteuclid.org/journals/statistical-science/volum...
In particular, variational inference is a family of techniques that makes these kinds of problems computationally tractable. It shows up everywhere from variational autoencoders, to time-series state-space modeling, to reinforcement learning.
If you want to learn more, I recommend reading Murphy's textbooks on ML: https://probml.github.io/pml-book/book2.html