Most active commenters

    ←back to thread

    Bayesian Statistics: The three cultures

    (statmodeling.stat.columbia.edu)
    309 points luu | 19 comments | | HN request time: 0.015s | source | bottom
    1. tfehring ◴[] No.41081746[source]
    The author is claiming that Bayesians vary along two axes: (1) whether they generally try to inform their priors with their knowledge or beliefs about the world, and (2) whether they iterate on the functional form of the model based on its goodness-of-fit and the reasonableness and utility of its outputs. He then labels 3 of the 4 resulting combinations as follows:

        ┌───────────────┬───────────┬──────────────┐
        │               │ iteration │ no iteration │
        ├───────────────┼───────────┼──────────────┤
        │ informative   │ pragmatic │ subjective   │
        │ uninformative │     -     │ objective    │
        └───────────────┴───────────┴──────────────┘
    
    My main disagreement with this model is the empty bottom-left box - in fact, I think that's where most self-labeled Bayesians in industry fall:

    - Iterating on the functional form of the model (and therefore the assumed underlying data generating process) is generally considered obviously good and necessary, in my experience.

    - Priors are usually uninformative or weakly informative, partly because data is often big enough to overwhelm the prior.

    The need for iteration feels so obvious to me that the entire "no iteration" column feels like a straw man. But the author, who knows far more academic statisticians than I do, explicitly says that he had the same belief and "was shocked to learn that statisticians didn’t think this way."

    replies(3): >>41081867 #>>41082105 #>>41084103 #
    2. klysm ◴[] No.41081867[source]
    The no iteration thing is very real and I don’t think it’s even for particularly bad reasons. We iterate on models to make them better, by some definition of better. It’s no secret that scientific work is subject to rather perverse incentives around thresholds of significance and positive results. Publish or perish. Perverse incentives lead to perverse statistics.

    The iteration itself is sometimes viewed directly as a problem. The “garden of forking paths”, where the analysis depends on the data, is viewed as a direct cause for some of the statistical and epistemological crises in science today.

    Iteration itself isn’t inherently bad. It’s just that the objective function usually isn’t what we want from a scientific perspective.

    To those actually doing scientific work, I suspect iterating on their models feels like they’re doing something unfaithful.

    Furthermore, I believe a lot of these issues are strongly related to the flawed epistemological framework which many scientific fields seem to have converged: p<0.05 means it’s true, otherwise it’s false.

    edit:

    Perhaps another way to characterize this discomfort is by the number of degrees of freedom that the analyst controls. In a Bayesian context where we are picking priors either by belief or previous data, the analyst has a _lot_ of control over how the results come out the other end.

    I think this is why fields have trended towards a set of ‘standard’ tests instead of building good statistical models. These take most of the knobs out of the hands of the analyst, and generally are more conservative.

    replies(3): >>41081904 #>>41082486 #>>41082720 #
    3. slashdave ◴[] No.41081904[source]
    In particle physics, it was quite fashionable (and may still be) to iterate on blinded data (data deliberated altered by a secret, random number, and/or relying entirely on Monte Carlo simulation).
    replies(2): >>41082107 #>>41082159 #
    4. Onavo ◴[] No.41082105[source]
    Interesting, in my experience modern ML runs almost entirely on pragmatic Bayes. You find your ELBO, you choose the latest latent variable du jour that best models your problem domain (these days it's all transformers), and then you start running experiments.
    replies(1): >>41082455 #
    5. klysm ◴[] No.41082107{3}[source]
    Interesting I wasn’t aware of that. Another thing I’ve only briefly read about is registering studies in advance, and quite literally preventing iteration.
    replies(1): >>41085898 #
    6. bordercases ◴[] No.41082159{3}[source]
    Yeah it's essentially a way to reflect parsimonious assumptions so that your output distribution can be characterized as a law.
    7. tfehring ◴[] No.41082455[source]
    I think each category of Bayesian described in the article generally falls under Breiman's [0] "data modeling" culture, while ML practitioners, even when using Bayesian methods, almost invariably fall under the "algorithmic modeling" culture. In particular, the article's definition of pragmatic Bayes says that "the model should be consistent with knowledge about the underlying scientific problem and the data collection process," which I don't consider the norm in ML at all.

    I do think ML practitioners in general align with the "iteration" category in my characterization, though you could joke that that miscategorizes people who just use (boosted trees|transformers) for everything.

    [0] https://projecteuclid.org/journals/statistical-science/volum...

    replies(1): >>41083670 #
    8. j7ake ◴[] No.41082486[source]
    Iteration is necessary for any analysis. To safeguard yourself from overfitting, be sure to have a hold out dataset that hasn’t been touched until the end.
    replies(1): >>41084424 #
    9. joeyo ◴[] No.41082720[source]

      > Iteration itself isn’t inherently bad. It’s just that the objective
      > function usually isn’t what we want from a scientific perspective.
    
    I think this is exactly right and touches on a key difference between science and engineering.

    Science: Is treatment A better than treatment B?

    Engineering: I would like to make a better treatment B.

    Iteration is harmful for the first goal yet essential for the second. I work in an applied science/engineering field where both perspectives exist. (and are necessary!) Which specific path is taken for any given experiment or analysis will depends on which goal one is trying to achieve. Conflict will sometimes arise when it's not clear which of these two objectives is the important one.

    replies(2): >>41082852 #>>41089242 #
    10. jiggawatts ◴[] No.41082852{3}[source]
    There is no difference between comparing A versus B or B1 versus B2. The data collection process and and the mathematical methods are (typically) identical or subject to the same issues.

    E.g.: profiling an existing application and tuning its performance is comparing two products, it just so happens that they’re different versions of the same series. If you compared it to a competing vendor’s product you should use the same mathematical analysis process.

    replies(1): >>41085377 #
    11. nextos ◴[] No.41083670{3}[source]
    > the model should be consistent with knowledge about the problem [...] which I don't consider the norm in ML at all.

    I don't think that is so niche. Murphy's vol II, a mainstream book, starts with this quote:

    "Intelligence is not just about pattern recognition and function approximation. It’s about modeling the world." — Josh Tenenbaum, NeurIPS 2021.

    Goodman & Tenenbaum have written e.g. https://probmods.org, which is very much about modeling data-generating processes.

    The same can be said about large parts of Murphy's book, Lee & Wagenmakers or Lunn et al. (the BUGS book).

    replies(1): >>41086537 #
    12. opensandwich ◴[] No.41084103[source]
    As someone who isn't particularly well-versed in Bayesian "stuff". Does Bayesian non-parametric methods fall under "uninformative" + "iteration" approach?

    I have a feeling I'm just totally barking up the wrong tree, but don't know where my thinking/understanding is just off.

    replies(1): >>41085201 #
    13. laichzeit0 ◴[] No.41084424{3}[source]
    What about automated predictive modeling pipelines? In other words, I want the best possible point estimates only on future data. I’d think, regardless of the model selection process, I want to reestimate the parameters on the entire dataset before I deploy it, so as not to “waste” data? I.e. I want to use the hold out test data in the final model. Is this valid?
    replies(1): >>41085909 #
    14. mjburgess ◴[] No.41085201[source]
    Non-parametric models can be generically understood as parametric on order statistics.
    15. gwd ◴[] No.41085377{4}[source]
    I was kind of scratching my head at what GP was getting at as well; I suspect that "better" has a different metric in the second case: i.e., the scientist is asking which chemical A or B has the stronger desired medical effect; the engineer is assuming we're going with chemical B, and trying to drive down cost of producing the chemical or improve lifespan of the pills or decrease discomfort administering or increase absorption speed or tweak the absorption curve or something like that. Those metrics are often much easier to measure than the effectiveness of the chemical itself, and much less scientifically interesting.
    16. disgruntledphd2 ◴[] No.41085898{4}[source]
    Given the set of scientific publication assumptions (predominantly p<=0.05) this can easily allow one to find whatever proof you were looking for, which is problematic.

    That being said, it's completely fair to use cross-validation and then run models on train, iterate with test and then finally calculate p-values with validation.

    The problem with that approach is that you need to collect much, much more data than people generally would. Given that most statistical tests were developed for a small data world, this can often work but in some cases (medicine, particularly) it's almost impossible and you need to rely on the much less useful bootstrapping or LOO-CV approaches.

    I guess the core problem is that the methods of statistical testing assume no iteration, but actually understanding data requires iteration, so there's a conflict here.

    If the scientific industry was OK with EDAs being published to try to tease out work for future experimental studies then we'd see more of this, but it's hard to get an EDA published so everyone does the EDA, and then rewrites the paper as though they'd expected whatever they found from the start, which is the worst of both worlds.

    17. disgruntledphd2 ◴[] No.41085909{4}[source]
    > What about automated predictive modeling pipelines? In other words, I want the best possible point estimates only on future data. I’d think, regardless of the model selection process, I want to reestimate the parameters on the entire dataset before I deploy it, so as not to “waste” data? I.e. I want to use the hold out test data in the final model. Is this valid?

    Personally, I think that as long as you're generating data constantly (through some kind of software/hardware process), then you'd be well served to keep your sets pure and build the model finally only on data not used in the original process. This is often wildly impractical (and is probably controversial even within the field), but it's safer.

    (If you train on the entire internet, this may not be possible also).

    18. ttyprintk ◴[] No.41086537{4}[source]
    Archive for Goodman & Tenenbaum, since their site is flaky:

    https://archive.ph/WKLyM

    19. thyrsus ◴[] No.41089242{3}[source]
    This is how I perceived the difference: >SCIENCE< [a] create a hypothesis [b] collect all the data [c] check the hypothesis and publish; >ENGINEERING< [a] create a hypothesis [b] collect some data [c] refine the hypothesis [d] iterate over [b] and [c] until [e] PROFIT! (and maybe publish someday); the engineering approach is often better funded, allowing more data collection and better validation. If your engineering model is sufficiently deficient your product will be rejected in the market if it can even get to market. If your scientific model is sufficiently deficient, a researcher depending on that model will someday publish a refinement.