←back to thread

279 points bookofjoe | 1 comments | | HN request time: 0s | source
Show context
biotechbio ◴[] No.44609723[source]
Some thoughts on this as someone working on circulating-tumor DNA for the last decade or so:

- Sure, cancer can develop years before diagnosis. Pre-cancerous clones harboring somatic mutations can exist for decades before transformation into malignant disease.

- The eternal challenge in ctDNA is achieving a "useful" sensitivity and specificity. For example, imagine you take some of your blood, extract the DNA floating in the plasma, hybrid-capture enrich for DNA in cancer driver genes, sequence super deep, call variants, do some filtering to remove noise and whatnot, and then you find some low allelic fraction mutations in TP53. What can you do about this? I don't know. Many of us have background somatic mutations speckled throughout our body as we age. Over age ~50, most of us are liable to have some kind of pre-cancerous clones in the esophagus, prostate, or blood (due to CHIP). Many of the popular MCED tests (e.g. Grail's Galleri) use signals other than mutations (e.g. methylation status) to improve this sensitivity / specificity profile, but I'm not convinced its actually good enough to be useful at the population level.

- The cost-effectiveness of most follow on screening is not viable for the given sensitivity-specificity profile of MCED assays (Grail would disagree). To achieve this, we would need things like downstream screening to be drastically cheaper, or possibly a tiered non-invasive screening strategy with increasing specificity to be viable (e.g. Harbinger Health).

replies(10): >>44610108 #>>44610191 #>>44610539 #>>44610565 #>>44610758 #>>44611096 #>>44611258 #>>44612058 #>>44612114 #>>44612576 #
ajb ◴[] No.44610758[source]
Here's what may seem like an unrelated question in response: how can we get 10^7+ bits of information out of the human body every day?

There are a lot of companies right now trying to apply AI to health, but what they are ignoring is that there are orders of magnitude less health data per person than there are cat pictures. (My phone probably contains 10^10 bits of cat pictures and my health record probably 10^3 bits, if that). But it's not wrong to try to apply AI, because we know that all processes leak information, including biological ones; and ML is a generic tool for extracting signal from noise, given sufficient data.

But our health information gathering systems are engineered to deal with individual very specific hypotheses generated by experts, which require high quality measurements of specific individual metrics that some expert, such as yourself, have figured may be relevant. So we get high quality data, in very small quantities -a few bits per measurement.

Suppose you invent a new cheap sensor for extracting large (10^7+ bits/day) quantities of information about human biochemistry, perhaps from excretions, or blood. You run a longitudinal study collecting this information from a cohort and start training a model to predict every health outcome.

What are the properties of the bits collected by such a sensor, that would make such a process likely to work out? The bits need to be "sufficiently heterogeneous" (but not necessarily independent) and their indexes need to be sufficiently stable (in some sense). What is not required if for specific individual data items to be measured with high quality. Because some information about the original that we're interested in (even though we don't know exactly what it is) will leak into the other measurements.

I predict that designs for such sensors, which cheaply perform large numbers of low quality measurements are would result in breakthroughs what in detection and treatment, by allowing ML to be applied to the problem effectively.

replies(5): >>44610815 #>>44610880 #>>44611051 #>>44612179 #>>44612833 #
1. standingca ◴[] No.44611051[source]
Or perhaps even routine bloodwork could incorporate some form of sequencing and longitudinal data banking. Deep sequencing, which may still be too expensive, generates tons of data that can be useful for things that we don't even know to look for today, capturing this data could let us retroactively identify meaningful biomarkers or early signals when we have better techniques. That way, each time models/methods improve, prior data becomes newly valuable. Perhaps the same could be said of raw data/readings from instruments running standard tests as well (as opposed to just the final results).

I'd be really curious to see how longitudinal results of sequencing + data banking, plus other routine bloodwork, could lead to early detection and better health outcomes.