←back to thread

817 points dynm | 2 comments | | HN request time: 0s | source
Show context
mg ◴[] No.43307263[source]
This is great. The author defines their own metrics, is doing their own A/B tests and publishes their interpretation plus the raw data. Imagine a world where all health blogging was like that.

Personally, I have not published any results yet, but I have been doing this type of experiments for 4 years now. And collected 48874 data points so far. I built a simple system to do it in Vim:

https://www.gibney.org/a_syntax_for_self-tracking

I also built a bunch of tooling to analyze the data.

I think that mankind could greatly benefit from more people doing randomized studies on their own. Especially if we find a way to collectively interpret the data.

So I really applaud the author for conducting this and especially for providing the raw data.

Reading through the article and the comments here on HN, I wish there was more focus on the interpretation of the experiment. Pretty much all comments here seem to be anecdotal.

Let's look at the author's interpretation. Personally, I find that part a bit short.

They calculated 4 p-values and write:

    Technically, I did find two significant results.
I wonder what "Technically" means here. Are there "significant results" that are "better" than just "technically significant results"?

Then they continue:

    Of course, I don’t think this
    means I’ve proven theanine is harmful.
So what does it mean? What was the goal of collecting the data? What would the interpretation have been if the data would show a significant positive effect of Theanine?

It's great that they offer the raw data. I look forward to taking a look at it later today.

replies(14): >>43307304 #>>43307775 #>>43307806 #>>43307937 #>>43308201 #>>43308318 #>>43308320 #>>43308521 #>>43308854 #>>43309271 #>>43310099 #>>43320433 #>>43333903 #>>43380374 #
matthewdgreen ◴[] No.43308318[source]
This is an N=1 trial. Dressing your N=1 trial up with lots of pseudo controls and pseudo blinding and data collection does not make it better. In fact: putting this much effort into any medication trial makes it much more likely that you’re going to be incentivized to find effects that don’t exist. I think it’s nice that the author admits that they found nothing, but statistically, worthless drugs show effects in much better-designed trials than this one: it’s basically a coin toss.
replies(12): >>43308423 #>>43308440 #>>43309081 #>>43309263 #>>43309513 #>>43309704 #>>43309722 #>>43309838 #>>43311651 #>>43313233 #>>43315219 #>>43322581 #
robwwilliams ◴[] No.43308440[source]
Complete injustice to this lovely study. Why do you say unblinded? Why do you insult a time series study as “dressing up with lots of data”? Would you rather see less data? Or are you volunteering to be test subject #2? Show us how to do it right Dr. M.!

In my opinion this is an exemplary N=1 study that is well designed and thoughtfully executed. Deserve accolades, not derision. And the author even recognizes possible improvements.

Unlike most large high N clinical trials this is a high resolution longitudinal trial, and it is perfectly “controlled” for genetic difference (none), well controlled for environment, and there is only one evaluator.

Compare this to the messy and mostly useless massive studies of human depression reviewed by Jonathan Flint.

https://pubmed.ncbi.nlm.nih.gov/36702864/

replies(6): >>43309100 #>>43309447 #>>43309575 #>>43312415 #>>43313221 #>>43316813 #
levocardia ◴[] No.43312415[source]
Because....

>While I was blinded during each trial, I saw the theanine/D result when I wrote it down. Over time I couldn’t help but notice that my stress dropped even when I took vitamin D, and that I was terrible at predicting what I’d taken

That is not blinding

replies(1): >>43313019 #
sgc ◴[] No.43313019[source]
I would have taken a well-calibrated photo of the cup each time without looking, maybe with a color card in the bottom, and only entered results at the end of the trial.

Given that there is no documentation of whether the events during the hour of test time were more or less stressful than those before it, and no taking the time of day, diet and exercise, sleep, location (quiet island or next to a construction site), etc into account, the data seems useless.

As a note, I have no idea why he bothered trying to guess what he had taken. What possible value could that have in this type of experiment?

Perhaps the correct course of action would be to ask for feedback in the design phase of an N=1 trial, especially a longer one, to avoid some basic mistakes.

replies(2): >>43314257 #>>43316298 #
fc417fc802 ◴[] No.43316298[source]
If your sample is blinded yet you consistently guess correctly, then presumably either 1. you failed at blinding or 2. there is a strong and discernible effect, regardless of what your other metrics might say (your other metrics could always be flawed after all).

> no documentation of whether the events during the hour of test time were more or less stressful than those before it, and no taking the time of day, diet and exercise, sleep, location

Assuming the blinded samples are uniformly randomly distributed, and assuming the study goes on long enough, then you'd expect that stuff to average out.

But I agree, it should be recorded nonetheless. That way you can verify at the end that it did, in fact, average out as you expected. If it didn't then your data is invalid.

replies(1): >>43328032 #
sgc ◴[] No.43328032[source]
He has 94 data points. Not nearly enough to average out so many potentially confounding variables, and there is no way to know they would. That will be the case in almost all N=1 experiments. Perhaps he takes the pill at the onset of stress, and stress almost always tends to build afterwards. This would be a probable case for many people trying to use theanine in this way. I could never accept that we should presume it would average out, and thus I consider logging potentially confounding variables essential to a valid experiment of this type of uncontrolled experiment. Your hypothesis would be much more relevant in a larger experiment of course.
replies(1): >>43328233 #
1. fc417fc802 ◴[] No.43328233{3}[source]
> Perhaps he takes the pill at the onset of stress, and stress almost always tends to build afterwards.

That would not be a problem regarding the averaging I referred to, although it could well pose a problem for measurement depending on how it interacted with the selected metrics.

Note that the averaging I refer to is not regarding all possible values of some metric, but rather any discrepancy in the distribution of metrics which we expected to follow the same distribution between the sample and the control.

I think maybe there's a misunderstanding? It seems that we both agree that a variety of additional variable should be logged. I was not suggesting to omit them, but rather to use discrepancies in them to detect fundamental issues with the data or study design. I would also expect larger studies to do the same where possible.

At 94 data points it is entirely possible that there would be outliers that would have averaged out for a larger N but did not. In such a scenario the presence of such outliers should then be taken to indicate a problem with the data (ie the more discrepancies you observe, the less you should trust the data).

replies(1): >>43331812 #
2. sgc ◴[] No.43331812[source]
I was mainly saying I don't understand why you were indicating we should presume or expect they would average out in an N=1 experiment. Even in much larger experiments that is not reliably the case. Science would be relatively easy if there were not a lot of noise in the real world. So, to be clear, my concern is mainly one of scale - the experiment is far too restricted to overlook this: the smaller the experiment, the more important this type of information. Perhaps you were not saying that such averaging might be possible in an N=1 experiment and I misread, since it seems that you comment here indicates a different point.

I of course agree that logging them is basic scientific methodology - in order to detect issues with the experiment, and even hopefully to see the signal through the noise.