←back to thread

623 points magicalhippo | 3 comments | | HN request time: 0.051s | source
Show context
a_bonobo ◴[] No.42621533[source]
There's a market not described here: bioinformatics.

The owner of the market, Illumina, already ships their own bespoke hardware chips in servers called DRAGEN for faster analysis of thousands of genomes. Their main market for this product is in personalised medicine, as genome sequencing in humans is becoming common.

Other companies like Oxford Nanopore use on-board GPUs to call bases (i.e., from raw electric signal coming off the sequencer to A, T, G, C) but it's not working as well as it could due to size and power constraints. I feel like this could be a huge game changer for someone like ONT, especially with cooler stuff like adaptive sequencing.

Other avenues of bioinformatics, such as most day-to-day analysis software, is still very CPU and RAM heavy.

replies(5): >>42621586 #>>42621696 #>>42622471 #>>42623285 #>>42632656 #
mycall ◴[] No.42622471[source]
The bigger picture is that OpenAI o3/o4.. plus specialized models will blow open the doors to genome tagging and discovery, but that is still 1 to 3 years away for ASI to kick in.
replies(1): >>42624413 #
1. nzach ◴[] No.42624413[source]
While I kinda agree with you, I don't think we will ever find a meaningful way to throw genome sequencing data at LLMs. It's simple too much data.

I've worked in a project some years ago where we were using data from genome sequencing of a bacteria. Every sequenced sample was around 3GB of data and sample size was pretty small with only about 100 samples to study.

I think the real revolution will happen because code generation through LLMs will allow biologists to write 'good enough' code to transform, process and analyze data. Today to do any meaningful work with genome data you need a pretty competent bioinformatician, and they are a rare breed. Removing this bottleneck is what will allow us to move faster in this field.

replies(2): >>42627119 #>>42628034 #
2. amelie-iska ◴[] No.42627119[source]
Just use a DNA/genomic language model like gLM2 or Evo and cross-attention that with o3 and you’re golden imo.
3. amelie-iska ◴[] No.42628034[source]
http://www.chat-protein.com/