Most active commenters
  • robwwilliams(4)
  • jrootabega(4)
  • sgc(4)
  • CamperBob2(3)

←back to thread

817 points dynm | 46 comments | | HN request time: 0.001s | source | bottom
Show context
mg ◴[] No.43307263[source]
This is great. The author defines their own metrics, is doing their own A/B tests and publishes their interpretation plus the raw data. Imagine a world where all health blogging was like that.

Personally, I have not published any results yet, but I have been doing this type of experiments for 4 years now. And collected 48874 data points so far. I built a simple system to do it in Vim:

https://www.gibney.org/a_syntax_for_self-tracking

I also built a bunch of tooling to analyze the data.

I think that mankind could greatly benefit from more people doing randomized studies on their own. Especially if we find a way to collectively interpret the data.

So I really applaud the author for conducting this and especially for providing the raw data.

Reading through the article and the comments here on HN, I wish there was more focus on the interpretation of the experiment. Pretty much all comments here seem to be anecdotal.

Let's look at the author's interpretation. Personally, I find that part a bit short.

They calculated 4 p-values and write:

    Technically, I did find two significant results.
I wonder what "Technically" means here. Are there "significant results" that are "better" than just "technically significant results"?

Then they continue:

    Of course, I don’t think this
    means I’ve proven theanine is harmful.
So what does it mean? What was the goal of collecting the data? What would the interpretation have been if the data would show a significant positive effect of Theanine?

It's great that they offer the raw data. I look forward to taking a look at it later today.

replies(14): >>43307304 #>>43307775 #>>43307806 #>>43307937 #>>43308201 #>>43308318 #>>43308320 #>>43308521 #>>43308854 #>>43309271 #>>43310099 #>>43320433 #>>43333903 #>>43380374 #
1. matthewdgreen ◴[] No.43308318[source]
This is an N=1 trial. Dressing your N=1 trial up with lots of pseudo controls and pseudo blinding and data collection does not make it better. In fact: putting this much effort into any medication trial makes it much more likely that you’re going to be incentivized to find effects that don’t exist. I think it’s nice that the author admits that they found nothing, but statistically, worthless drugs show effects in much better-designed trials than this one: it’s basically a coin toss.
replies(12): >>43308423 #>>43308440 #>>43309081 #>>43309263 #>>43309513 #>>43309704 #>>43309722 #>>43309838 #>>43311651 #>>43313233 #>>43315219 #>>43322581 #
2. episteme ◴[] No.43308423[source]
You could argue that this N is the only N that matters though.
replies(1): >>43308561 #
3. robwwilliams ◴[] No.43308440[source]
Complete injustice to this lovely study. Why do you say unblinded? Why do you insult a time series study as “dressing up with lots of data”? Would you rather see less data? Or are you volunteering to be test subject #2? Show us how to do it right Dr. M.!

In my opinion this is an exemplary N=1 study that is well designed and thoughtfully executed. Deserve accolades, not derision. And the author even recognizes possible improvements.

Unlike most large high N clinical trials this is a high resolution longitudinal trial, and it is perfectly “controlled” for genetic difference (none), well controlled for environment, and there is only one evaluator.

Compare this to the messy and mostly useless massive studies of human depression reviewed by Jonathan Flint.

https://pubmed.ncbi.nlm.nih.gov/36702864/

replies(6): >>43309100 #>>43309447 #>>43309575 #>>43312415 #>>43313221 #>>43316813 #
4. malfist ◴[] No.43308561[source]
Not too me though. Especially not to me if I'm trying to decide to supplement l-theanine
replies(2): >>43309069 #>>43310268 #
5. ◴[] No.43309069{3}[source]
6. wslh ◴[] No.43309081[source]
In science, an n=1 experiment isn’t discarded; instead, it adds information that can guide future experiments.
7. grafmax ◴[] No.43309100[source]
I think social media discussions of science would be better informed by the concept of a ‘hierarchy of evidence’.

Anecdotal data, n=1 trials of varying quality, correlations studies, double blind studies (with small and large cohorts), studies without attempted replication and studies with heavy replication - they are all provide evidence of varying quality and can inform the holistic scientific picture. They can all serve a purpose such as inspiring further research, providing fodder for meta analyses, etc. It simply isn’t true that gathered evidence ought to be casually discarded if it doesn’t attain the highest levels of the hierarchy of evidence. Neither is it true that some small study showing (or not showing) some supposed effect should drastically change all our lifestyle habits. The truth lies somewhere in the middle. The concept of a hierarchy of evidence can help us navigate these apparently mixed signals so prevalent in popular science discussions.

replies(1): >>43312713 #
8. derlvative ◴[] No.43309263[source]
Noooooo you can’t just run independent experiments you need institutions and phds and bureaucracy and gold plating nooooo
replies(2): >>43310620 #>>43311503 #
9. ◴[] No.43309447[source]
10. altcognito ◴[] No.43309513[source]
Why do you say pseudo blinding, it seems like it is blind in that the author doesn’t know if he is taking the test or not.

Now you can argue that there isn’t enough time between samples, or he needs more subjects but he was blind to whether he was taking it that day or not.

replies(1): >>43309627 #
11. jrootabega ◴[] No.43309575[source]
If he said unblinded at some point, it could have been because the study author looked into the cup to determine which substance had been taken too soon. The subject should have had no knowledge of what was taken until the entire 16-month trial was over.

We should avoid extreme polarization of our judgments in general. The study deserves some amount of praise for things it did somewhat well (like the method of blinding which is clever, but not applicable to everyone), and criticism for things it did not do well, such as designing your own study methodology for your own mood. That alone will affect the results. Simply RUNNING an experiment can affect your mood because it's interesting (or even maybe frustrating). The subject probably felt pride and satisfaction whenever they used their pill selection technique, which could improve mood on its own. Neither accolades nor complete derision are appropriate, although trying to claim too strong a result from this study is kinda deserving of derision if you claim to be science-minded.

The study was well-meaning and displayed cleverness.

replies(1): >>43310475 #
12. jrootabega ◴[] No.43309627[source]
If the author felt good on a particular day for whatever reason, and then learned they had taken the active substance, their reports are contaminated forever. It works the other way, too. It works any way you slice it.
13. brothrock ◴[] No.43309704[source]
N=1 is addressed, see outcome predictions. N=1 comes with caveats, of course, but a study like this, with a proven harmless supplement, should be welcomed and praised.

It is clearly a step forward from what you can watch about theanine on YouTube or TikTok. I consider this a work of citizen science. While it should not be taken for more than it is, it’s a great example of how someone can experiment without a high burden.

replies(2): >>43310245 #>>43312350 #
14. jrootabega ◴[] No.43309722[source]
Hell, I'd say it's an 0<=N<1 because it involves subjective mood reporting, and there was no participant who was not contaminated by flaws in the methodology.
15. tomrod ◴[] No.43309838[source]
Aye. Completely agree. Path dependence matters, which is why you can't just look at pre/post action.
16. ryandrake ◴[] No.43310245[source]
That's a pretty low bar though. OK, it's one step up from a monetized YouTube video that boils down to "It works--Trust me, bro." I still wouldn't really call it citizen science.
17. ryandrake ◴[] No.43310268{3}[source]
Right. The only N that you can draw any conclusions about is the author himself. So, why even publish it? The "results" are not applicable to anyone reading it. This is the health version of the software industry's "It works on my system!"
replies(1): >>43311222 #
18. robwwilliams ◴[] No.43310475{3}[source]
And that is exactly the point made in the target post by the author. He explicitly raised that criticism himself. Double kudos for self-criticism. You will not find many conventional science publications pointing out: “Shucks, we could have done this a better”.
replies(1): >>43310551 #
19. jrootabega ◴[] No.43310551{4}[source]
The ancestor post is neither a "Complete injustice" nor "derision" nor an "insult", and it doesn't warrant a hostile mocking reply. Its tone could have been gentler, but it wasn't that bad. And the study doesn't really deserve "accolades", it deserves to be recognized for whatever it does well. Such polarization of tone and vocabulary doesn't accomplish much, and I'll even propose that it actually prevents good things from happening. It is good that the author is aware of, and acknowledges, the problems in the study. What other studies and journals have done wrong doesn't make the author or study more deserving of praise.

Also, you asked why he said "unblinded", and I think you now have the answer to that.

replies(1): >>43316199 #
20. smohare ◴[] No.43310620[source]
It’s not about elitism. It’s that there are so many confounding factors that even a well-informed approach makes such a study comtain very little of value

Comments like yours expose a particularly distasteful amount of hubris.

replies(1): >>43320155 #
21. episteme ◴[] No.43311222{4}[source]
Publish it is quite a loaded word here. Almost every blog or post you read is from the point of view of a single person. There's nothing wrong with him putting his experience out into the world.
22. CamperBob2 ◴[] No.43311503[source]
These smug pilots have lost touch with the down-to-earth lives and concerns of ordinary passengers like us. Let's see a show of hands: who thinks I should fly the plane?
replies(1): >>43316712 #
23. azeirah ◴[] No.43311651[source]
Any experiment you perform on yourself has N=1. Self-science isn't as robust as gold standard double blind blablabla PhD trials, but come on.

How else are you going to find out whether a particular diet or medication works for you specifically? It's ALWAYS N=1.

replies(1): >>43311770 #
24. hartator ◴[] No.43311770[source]
And a honest n=1 is better than a double blind n=1000 but outcome-interested study. It’s so easy to make data tell the story you want.
25. matthewdgreen ◴[] No.43312350[source]
N=1 studies aren’t evil. They’re just pretty close to the entire history of pre-modern medicine that led us to bad evidence. My concern here is not that someone is sharing their opinions, it’s the fact that the person doing this explicitly heaps derision on the “placebo people” (or some other phrasing) and then heaps praise on other people doing N=1 studies and proceeds to do one. This stuff all needs to be treated with humor, good faith, and then extreme skepticism about any result it produces.
26. levocardia ◴[] No.43312415[source]
Because....

>While I was blinded during each trial, I saw the theanine/D result when I wrote it down. Over time I couldn’t help but notice that my stress dropped even when I took vitamin D, and that I was terrible at predicting what I’d taken

That is not blinding

replies(1): >>43313019 #
27. abirch ◴[] No.43312713{3}[source]
Here’s a relevant xkcd https://xkcd.com/882/ For the small sample
28. sgc ◴[] No.43313019{3}[source]
I would have taken a well-calibrated photo of the cup each time without looking, maybe with a color card in the bottom, and only entered results at the end of the trial.

Given that there is no documentation of whether the events during the hour of test time were more or less stressful than those before it, and no taking the time of day, diet and exercise, sleep, location (quiet island or next to a construction site), etc into account, the data seems useless.

As a note, I have no idea why he bothered trying to guess what he had taken. What possible value could that have in this type of experiment?

Perhaps the correct course of action would be to ask for feedback in the design phase of an N=1 trial, especially a longer one, to avoid some basic mistakes.

replies(2): >>43314257 #>>43316298 #
29. throwup238 ◴[] No.43313221[source]
> Why do you say unblinded?

It’s unblinded because the subject is preparing the concoction under study. There is no way they can create a blind experiment if they’re the ones preparing the control. The placebo effect is nothing if not pernicious and cunning, able to exploit even the most subtle psychological signal - like minuscule differences in the amount of powder in a capsule.

Blinded studies have independent doctors prepare and dispense the candidate drug so they know whether its the real thing or a placebo, but their patients dont. In double blinded studies, neither the doctor nor the patient have any idea about what they’re getting because a third party prepares the drugs.

30. cortesoft ◴[] No.43313233[source]
N=1 trial is great if you are trying to figure out what works for you as an individual
31. kqr ◴[] No.43314257{4}[source]
> As a note, I have no idea why he bothered trying to guess what he had taken. What possible value could that have in this type of experiment?

It gives a hint of how well the blinding worked.

replies(1): >>43315493 #
32. bonestamp2 ◴[] No.43315219[source]
> This is an N=1 trial

Sure, OP addressed that and said it would be especially useful more people did it and "if we find a way to collectively interpret the data".

33. sgc ◴[] No.43315493{5}[source]
Yes, that is a good point. But it also causes him to continuously think about indicators to determine what he has taken. He is constantly trying to punch the veil on this "blind" study instead of doing everything he can to avoid that. Are they really the exact same weight? Do they really feel the same in his hand, mouth and throat? Does one have a slightly different taste - after all, no filling process will leave the outside of the capsule 100% free of powder, etc? Very subtle differences could manifest themselves over 16 months.
34. robwwilliams ◴[] No.43316199{5}[source]
Yes, perhaps. But please tell me you have read the original post. It is thoughtful, self-deprecatory, careful, well analyzed, and upfront about limitations and possible improvements.

Re-reading such a negative critique of a solid home-brew experiment is unwarranted. There are several word here worth red flags.

>This is an N=1 trial. Dressing your N=1 trial up with lots of pseudo controls and pseudo blinding and data collection does not make it better. In fact: putting this much effort into any medication trial makes it much more likely that you’re going to be incentivized to find effects that don’t exist. I think it’s nice that the author admits that they found nothing, but statistically, worthless drugs show effects in much better-designed trials than this one: it’s basically a coin toss.

35. fc417fc802 ◴[] No.43316298{4}[source]
If your sample is blinded yet you consistently guess correctly, then presumably either 1. you failed at blinding or 2. there is a strong and discernible effect, regardless of what your other metrics might say (your other metrics could always be flawed after all).

> no documentation of whether the events during the hour of test time were more or less stressful than those before it, and no taking the time of day, diet and exercise, sleep, location

Assuming the blinded samples are uniformly randomly distributed, and assuming the study goes on long enough, then you'd expect that stuff to average out.

But I agree, it should be recorded nonetheless. That way you can verify at the end that it did, in fact, average out as you expected. If it didn't then your data is invalid.

replies(1): >>43328032 #
36. eru ◴[] No.43316712{3}[source]
You should link to the source of your quote.
replies(1): >>43317308 #
37. eth0up ◴[] No.43316813[source]
If you ever wonder why some folks with a fair amount of potential and something to offer keep to themselves, this isn't the worst example.

I think most people could criticize the carbon out of a corpse if they themselves weren't being criticized into one.

If we devolved from apes, maybe apes devolved from piranhas.

replies(1): >>43319369 #
38. CamperBob2 ◴[] No.43317308{4}[source]
Arguably so, but all I can find are other plagiarized copies. :-P

The New Yorker's paywall has successfully obscured the origin of the joke, so that's on them, as far as I'm concerned.

replies(1): >>43318253 #
39. eru ◴[] No.43318253{5}[source]
Yes.

Well, instead of an actual web link, you could just mention that it's from the New Yorker.

replies(1): >>43322365 #
40. robwwilliams ◴[] No.43319369{3}[source]
hilarious
41. derlvative ◴[] No.43320155{3}[source]
You don’t like it you don’t have to read the blog article. I assure you are not the intended audience. For the rest of us it provided valuable insight.
42. CamperBob2 ◴[] No.43322365{6}[source]
"It's from the New Yorker. The most important magazine of our time. Probably the most important magazine that ever was."

(From a vaguely-remembered 1990s-era ad campaign that I thought was excruciatingly self-indulgent at the time, but which evidently worked.)

43. Enginerrrd ◴[] No.43322581[source]
>Dressing your N=1 trial up with lots of pseudo controls and pseudo blinding and data collection does not make it better.

This is only an appropriate criticism in so far as you want to make conclusions about theanine as an intervention in the broader population.

It is however, perhaps much BETTER than large N trials if the author wishes to draw conclusions about how theanine affects THEM.

44. sgc ◴[] No.43328032{5}[source]
He has 94 data points. Not nearly enough to average out so many potentially confounding variables, and there is no way to know they would. That will be the case in almost all N=1 experiments. Perhaps he takes the pill at the onset of stress, and stress almost always tends to build afterwards. This would be a probable case for many people trying to use theanine in this way. I could never accept that we should presume it would average out, and thus I consider logging potentially confounding variables essential to a valid experiment of this type of uncontrolled experiment. Your hypothesis would be much more relevant in a larger experiment of course.
replies(1): >>43328233 #
45. fc417fc802 ◴[] No.43328233{6}[source]
> Perhaps he takes the pill at the onset of stress, and stress almost always tends to build afterwards.

That would not be a problem regarding the averaging I referred to, although it could well pose a problem for measurement depending on how it interacted with the selected metrics.

Note that the averaging I refer to is not regarding all possible values of some metric, but rather any discrepancy in the distribution of metrics which we expected to follow the same distribution between the sample and the control.

I think maybe there's a misunderstanding? It seems that we both agree that a variety of additional variable should be logged. I was not suggesting to omit them, but rather to use discrepancies in them to detect fundamental issues with the data or study design. I would also expect larger studies to do the same where possible.

At 94 data points it is entirely possible that there would be outliers that would have averaged out for a larger N but did not. In such a scenario the presence of such outliers should then be taken to indicate a problem with the data (ie the more discrepancies you observe, the less you should trust the data).

replies(1): >>43331812 #
46. sgc ◴[] No.43331812{7}[source]
I was mainly saying I don't understand why you were indicating we should presume or expect they would average out in an N=1 experiment. Even in much larger experiments that is not reliably the case. Science would be relatively easy if there were not a lot of noise in the real world. So, to be clear, my concern is mainly one of scale - the experiment is far too restricted to overlook this: the smaller the experiment, the more important this type of information. Perhaps you were not saying that such averaging might be possible in an N=1 experiment and I misread, since it seems that you comment here indicates a different point.

I of course agree that logging them is basic scientific methodology - in order to detect issues with the experiment, and even hopefully to see the signal through the noise.