←back to thread

262 points rain1 | 1 comments | | HN request time: 0.28s | source
Show context
mjburgess ◴[] No.44442335[source]
Deepseek v1 is ~670Bn which is ~1.4TB physical.

All digitized books ever written/encoded compress to a few TB. The public web is ~50TB. I think a usable zip of all english electronic text publicly available would be on O(100TB). So we're at about 1% of that in model size, and we're in a diminishing-returns area of training -- ie., going to >1% has not yielded improvements (cf. gpt4.5 vs 4o).

This is why compute spend is moving to inference time with "reasoning" models. It's likely we're close to diminshing returns on inference-time compute now too, hence agents whereby (mostly,) deterministic tools are supplementing information /capability into the system.

I think to get any more value out of this model class, we'll be looking at domain-specific specialisation beyond instruction fine-tuning.

I'd guess targeting 1TB inference-time VRAM would be a reasonable medium-term target for high quality open source models -- that's within the reach of most SMEs today. That's about 250bn params.

replies(9): >>44442404 #>>44442633 #>>44442696 #>>44443009 #>>44443088 #>>44443188 #>>44443289 #>>44444740 #>>44449842 #
account-5 ◴[] No.44442404[source]
> All digitized books ever written/encoded compress to a few TB. The public web is ~50TB. I think a usable zip of all english electronic text publicly available would be on O(100TB).

Where you getting these numbers from? Interested to see how that's calculated.

I read somewhere, but cannot find the source anymore, that all written text prior to this century was approx 50MB. (Might be misquoted as don't have source anymore).

replies(6): >>44442434 #>>44442485 #>>44442551 #>>44442770 #>>44443245 #>>44462214 #
kmm ◴[] No.44442485[source]
Perhaps that's meant to be 50GB (and that still seems like a serious underestimation)? Just the Bible is already 5MB.
replies(1): >>44442513 #
_Algernon_ ◴[] No.44442513[source]
English Wikipedia without media alone is ~24 GB compressed.

https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia

replies(1): >>44444121 #
1. kmm ◴[] No.44444121[source]
I don't see how the size of Wikipedia has any bearing on the 50MB figure given for pre-20th century literature by the parent.