←back to thread

817 points dynm | 1 comments | | HN request time: 0s | source
Show context
mg ◴[] No.43307263[source]
This is great. The author defines their own metrics, is doing their own A/B tests and publishes their interpretation plus the raw data. Imagine a world where all health blogging was like that.

Personally, I have not published any results yet, but I have been doing this type of experiments for 4 years now. And collected 48874 data points so far. I built a simple system to do it in Vim:

https://www.gibney.org/a_syntax_for_self-tracking

I also built a bunch of tooling to analyze the data.

I think that mankind could greatly benefit from more people doing randomized studies on their own. Especially if we find a way to collectively interpret the data.

So I really applaud the author for conducting this and especially for providing the raw data.

Reading through the article and the comments here on HN, I wish there was more focus on the interpretation of the experiment. Pretty much all comments here seem to be anecdotal.

Let's look at the author's interpretation. Personally, I find that part a bit short.

They calculated 4 p-values and write:

    Technically, I did find two significant results.
I wonder what "Technically" means here. Are there "significant results" that are "better" than just "technically significant results"?

Then they continue:

    Of course, I don’t think this
    means I’ve proven theanine is harmful.
So what does it mean? What was the goal of collecting the data? What would the interpretation have been if the data would show a significant positive effect of Theanine?

It's great that they offer the raw data. I look forward to taking a look at it later today.

replies(14): >>43307304 #>>43307775 #>>43307806 #>>43307937 #>>43308201 #>>43308318 #>>43308320 #>>43308521 #>>43308854 #>>43309271 #>>43310099 #>>43320433 #>>43333903 #>>43380374 #
robwwilliams ◴[] No.43308320[source]
“Technically” here could imply “not corrected for multiple tests”. But the typical qualifier I use is “nominally significant” when I don’t apply an correction for multiple tests.

Or “not using a one-way t test”.

The most appropriate null hypothesis in this lovely study is “does theanine REDUCE anxiety”, not “does theanine change anxiety either up or down”.

What impressed me most is the suggestion for an improved experimental design to remove his temporal drift by using 100 pre-loaded envelopes and only decoding the results at the end.

replies(2): >>43308383 #>>43310822 #
azalemeth ◴[] No.43308383[source]
> The most appropriate null hypothesis in this lovely study is “does theanine REDUCE anxiety”, not “does theanine change anxiety either up or down”.

I disagree with this. You have a prior belief that theanine might reduce anxiety; if you wanted to you could codify that subjective belief and perform some variety of Bayesian hypothesis test [1] and compute a Bayes factor. The main reason that one-sided tests are advocated for is power; that is often the same as having a prior belief in disguise. Why not quantify it?

However, scientifically, if the data conclusively show that "theanine increases anxiety" that is a meaningful, non-artefactual result: it is hugely important to be sensitive to the answer 'you are wrong' and may well ironically spur development in a direction to help understand what is going on. I personally think that one sided tests are best avoided except in the case where it is physically impossible to have an effect in the other direction. Examples of this are rare, but they do occasionally exist.

[1] https://mspeekenbrink.github.io/sdam-book/ch-Bayes-factors.h...

replies(1): >>43308845 #
1. robwwilliams ◴[] No.43308845[source]
Sure. I can see your point, but the most reasonable posterior probability of the null given the biohacker community’s belief is one-tailed. This also gives more power to reject the null.