←back to thread

262 points rain1 | 1 comments | | HN request time: 0.351s | source
Show context
mjburgess ◴[] No.44442335[source]
Deepseek v1 is ~670Bn which is ~1.4TB physical.

All digitized books ever written/encoded compress to a few TB. The public web is ~50TB. I think a usable zip of all english electronic text publicly available would be on O(100TB). So we're at about 1% of that in model size, and we're in a diminishing-returns area of training -- ie., going to >1% has not yielded improvements (cf. gpt4.5 vs 4o).

This is why compute spend is moving to inference time with "reasoning" models. It's likely we're close to diminshing returns on inference-time compute now too, hence agents whereby (mostly,) deterministic tools are supplementing information /capability into the system.

I think to get any more value out of this model class, we'll be looking at domain-specific specialisation beyond instruction fine-tuning.

I'd guess targeting 1TB inference-time VRAM would be a reasonable medium-term target for high quality open source models -- that's within the reach of most SMEs today. That's about 250bn params.

replies(9): >>44442404 #>>44442633 #>>44442696 #>>44443009 #>>44443088 #>>44443188 #>>44443289 #>>44444740 #>>44449842 #
andrepd ◴[] No.44443088[source]
> 50TB

There's no way the entire Web fits in 400$ worth of hard drives.

replies(2): >>44443160 #>>44443206 #
1. flir ◴[] No.44443206[source]
Nah, Common Crawl puts on 250TB a month.

Maybe text only, though...