Bayesian Statistics: The three cultures

(statmodeling.stat.columbia.edu)

309 points luu | 1 comments | 26 Jul 24 17:15 UTC | HN request time: 0.206s | source

Show context

tfehring ◴[26 Jul 24 19:59 UTC] No.41081746[source]▶

The author is claiming that Bayesians vary along two axes: (1) whether they generally try to inform their priors with their knowledge or beliefs about the world, and (2) whether they iterate on the functional form of the model based on its goodness-of-fit and the reasonableness and utility of its outputs. He then labels 3 of the 4 resulting combinations as follows:

    ┌───────────────┬───────────┬──────────────┐
    │               │ iteration │ no iteration │
    ├───────────────┼───────────┼──────────────┤
    │ informative   │ pragmatic │ subjective   │
    │ uninformative │     -     │ objective    │
    └───────────────┴───────────┴──────────────┘

My main disagreement with this model is the empty bottom-left box - in fact, I think that's where most self-labeled Bayesians in industry fall:

- Iterating on the functional form of the model (and therefore the assumed underlying data generating process) is generally considered obviously good and necessary, in my experience.

- Priors are usually uninformative or weakly informative, partly because data is often big enough to overwhelm the prior.

The need for iteration feels so obvious to me that the entire "no iteration" column feels like a straw man. But the author, who knows far more academic statisticians than I do, explicitly says that he had the same belief and "was shocked to learn that statisticians didn’t think this way."

replies(3): >>41081867 #>>41082105 #>>41084103 #

klysm ◴[26 Jul 24 20:15 UTC] No.41081867[source]▶

>>41081746 #

The no iteration thing is very real and I don’t think it’s even for particularly bad reasons. We iterate on models to make them better, by some definition of better. It’s no secret that scientific work is subject to rather perverse incentives around thresholds of significance and positive results. Publish or perish. Perverse incentives lead to perverse statistics.

The iteration itself is sometimes viewed directly as a problem. The “garden of forking paths”, where the analysis depends on the data, is viewed as a direct cause for some of the statistical and epistemological crises in science today.

Iteration itself isn’t inherently bad. It’s just that the objective function usually isn’t what we want from a scientific perspective.

To those actually doing scientific work, I suspect iterating on their models feels like they’re doing something unfaithful.

Furthermore, I believe a lot of these issues are strongly related to the flawed epistemological framework which many scientific fields seem to have converged: p<0.05 means it’s true, otherwise it’s false.

edit:

Perhaps another way to characterize this discomfort is by the number of degrees of freedom that the analyst controls. In a Bayesian context where we are picking priors either by belief or previous data, the analyst has a _lot_ of control over how the results come out the other end.

I think this is why fields have trended towards a set of ‘standard’ tests instead of building good statistical models. These take most of the knobs out of the hands of the analyst, and generally are more conservative.

replies(3): >>41081904 #>>41082486 #>>41082720 #

slashdave ◴[26 Jul 24 20:19 UTC] No.41081904[source]▶

>>41081867 #

In particle physics, it was quite fashionable (and may still be) to iterate on blinded data (data deliberated altered by a secret, random number, and/or relying entirely on Monte Carlo simulation).

replies(2): >>41082107 #>>41082159 #

1. bordercases ◴[26 Jul 24 20:48 UTC] No.41082159[source]▶

>>41081904 #

Yeah it's essentially a way to reflect parsimonious assumptions so that your output distribution can be characterized as a law.

↑