How large are large language models?

(gist.github.com)

262 points rain1 | 1 comments | 02 Jul 25 10:39 UTC | HN request time: 0.216s | source

Show context

mjburgess ◴[02 Jul 25 11:18 UTC] No.44442335[source]▶

>>44442072 (OP) #

Deepseek v1 is ~670Bn which is ~1.4TB physical.

All digitized books ever written/encoded compress to a few TB. The public web is ~50TB. I think a usable zip of all english electronic text publicly available would be on O(100TB). So we're at about 1% of that in model size, and we're in a diminishing-returns area of training -- ie., going to >1% has not yielded improvements (cf. gpt4.5 vs 4o).

This is why compute spend is moving to inference time with "reasoning" models. It's likely we're close to diminshing returns on inference-time compute now too, hence agents whereby (mostly,) deterministic tools are supplementing information /capability into the system.

I think to get any more value out of this model class, we'll be looking at domain-specific specialisation beyond instruction fine-tuning.

I'd guess targeting 1TB inference-time VRAM would be a reasonable medium-term target for high quality open source models -- that's within the reach of most SMEs today. That's about 250bn params.

replies(9): >>44442404 #>>44442633 #>>44442696 #>>44443009 #>>44443088 #>>44443188 #>>44443289 #>>44444740 #>>44449842 #

smokel ◴[02 Jul 25 11:57 UTC] No.44442633[source]▶

>>44442335 #

Simply add images and video, and these estimates start to sound like the "640 KB should be enough for everyone".

After that, make the robots explore and interact with the world by themselves, to fetch even more data.

In all seriousness, adding image and interaction data will probably be enormously useful, even for generating text.

replies(1): >>44443244 #

netcan ◴[02 Jul 25 13:03 UTC] No.44443244[source]▶

>>44442633 #

Like both will be done. Idk what the roi is on adding video data to the text models, but it's presumably lower than text.

There are just a lot of avenues to try at this point.

replies(1): >>44446677 #

llSourcell ◴[02 Jul 25 17:46 UTC] No.44446677[source]▶

>>44443244 #

no its not lower than text, its higher ROI than text for understanding the physics of the world, which is exactly what videos are better at than text when it comes to training data

replies(1): >>44448134 #

AstroBen ◴[02 Jul 25 19:58 UTC] No.44448134[source]▶

>>44446677 #

Does that transfer, though? I'm not sure we can expect its ability to approximate physics in video form would transfer to any other mode (text, code, problem solving etc)

replies(1): >>44448595 #

1. ricopags ◴[02 Jul 25 20:41 UTC] No.44448595[source]▶

>>44448134 #

depends on the hyperparams but one of the biggest benefits of a latent space is transfer between modalities

↑