Bayesian Statistics: The three cultures

1. brcmthrowaway ◴[26 Jul 24 18:57 UTC] No.41081254[source]▶

Where does Deep Learning come in?

replies(6): >>41081343 #>>41081808 #>>41081817 #>>41081946 #>>41082236 #>>41116247 #

2. thegginthesky ◴[26 Jul 24 19:08 UTC] No.41081343[source]▶

Most models are derived of Machine Learning principles that are a mix of classic probability theory, Frequentist and Bayesian statistics and lots of Computer Science fundamentals. But there have been advancements in Bayesian Inference and Bayesian Deep Learning, you should check the work of frameworks like Pyro (built on top of PyTorch)

Edit: corrected my sentence, but see 0xdde reply for better info.

replies(1): >>41081458 #

3. 0xdde ◴[26 Jul 24 19:23 UTC] No.41081458[source]▶

>>41081343 #

I could be wrong, but my sense is that ML has leaned Bayesian for a very long time. For example, even Bishop's widely used book from 2006 [1] is Bayesian. Not sure how Bayesian his new deep learning book is.

[1] https://www.microsoft.com/en-us/research/publication/pattern...

replies(1): >>41081786 #

4. thegginthesky ◴[26 Jul 24 20:05 UTC] No.41081786{3}[source]▶

>>41081458 #

I stand corrected! It was my impression that many methods used in ML such as Support Vector Machines, Decision Trees, Random Forests, Boosting, Bagging and so on have very deep roots in Frequentist Methods, although current CS implementations lean heavily on optimizations such as Gradient Descent.

Giving a cursory look into Bishop's book I see that I am wrong, as there's deep root in Bayesian Inference as well.

On another note, I find it very interesting that there's not a bigger emphasis on using the correct distributions in ML models, as the methods are much more concerned in optimizing objective functions.

5. tfehring ◴[26 Jul 24 20:07 UTC] No.41081808[source]▶

>>41081254 (TP) #

An implicit shared belief of all of the practitioners the author mentions is that they attempt to construct models that correspond to some underlying "data generating process". Machine learning practitioners may use similar models or even the same models as Bayesian statisticians, but they tend to evaluate their models primarily or entirely based on their predictive performance, not on intuitions about why the data is taking on the values that it is.

See Breiman's classic "Two Cultures" paper that this post's title is referencing: https://projecteuclid.org/journals/statistical-science/volum...

6. vermarish ◴[26 Jul 24 20:08 UTC] No.41081817[source]▶

>>41081254 (TP) #

At a high level, Bayesian statistics and DL share the same objective of fitting parameters to models.

In particular, variational inference is a family of techniques that makes these kinds of problems computationally tractable. It shows up everywhere from variational autoencoders, to time-series state-space modeling, to reinforcement learning.

If you want to learn more, I recommend reading Murphy's textbooks on ML: https://probml.github.io/pml-book/book2.html

7. klysm ◴[26 Jul 24 20:24 UTC] No.41081946[source]▶

>>41081254 (TP) #

Not sure why this is being downvoted, as it’s mentioned peripherally in the article. I think it’s primary used as an extreme example of a model where the inner mechanism is entirely inscrutable.

8. samch93 ◴[26 Jul 24 21:00 UTC] No.41082236[source]▶

>>41081254 (TP) #

A (deep) NN is just a really complicated data model, the way one treats the estimation of its parameters and prediction of new data determines whether one is a Bayesian or a frequentist. The Bayesian assigns a distribution to the parameters and then conditions on the data to obtain a posterior distribution based on which a posterior predictive distribution is obtained for new data, while the frequentist treats parameters as fixed quantities and estimates them from the likelihood alone, e.g., with maximum likelihood (potentially using some hacks such as regularization, which themselves can be given a Bayesian interpretation).

9. esafak ◴[31 Jul 24 03:46 UTC] No.41116247[source]▶

>>41081254 (TP) #

https://en.wikipedia.org/wiki/Statistical_learning_theory