←back to thread

262 points rain1 | 10 comments | | HN request time: 1.874s | source | bottom
Show context
ljoshua ◴[] No.44443222[source]
Less a technical comment and more just a mind-blown comment, but I still can’t get over just how much data is compressed into and available in these downloadable models. Yesterday I was on a plane with no WiFi, but had gemma3:12b downloaded through Ollama. Was playing around with it and showing my kids, and we fired history questions at it, questions about recent video games, and some animal fact questions. It wasn’t perfect, but holy cow the breadth of information that is embedded in an 8.1 GB file is incredible! Lossy, sure, but a pretty amazing way of compressing all of human knowledge into something incredibly contained.
replies(22): >>44443263 #>>44443274 #>>44443296 #>>44443751 #>>44443781 #>>44443840 #>>44443976 #>>44444227 #>>44444418 #>>44444471 #>>44445299 #>>44445966 #>>44446013 #>>44446775 #>>44447373 #>>44448218 #>>44448315 #>>44448452 #>>44448810 #>>44449169 #>>44449182 #>>44449585 #
1. nico ◴[] No.44444418[source]
For reference (according to Google):

> The English Wikipedia, as of June 26, 2025, contains over 7 million articles and 63 million pages. The text content alone is approximately 156 GB, according to Wikipedia's statistics page. When including all revisions, the total size of the database is roughly 26 terabytes (26,455 GB)

replies(3): >>44444951 #>>44448715 #>>44448846 #
2. sharkjacobs ◴[] No.44444951[source]
better point of reference might be pages-articles-multistream.xml.bz2 (current pages without edit/revision history, no talk pages, no user pages) which is 20GB

https://en.wikipedia.org/wiki/Wikipedia:Database_download#Wh...?

replies(1): >>44449620 #
3. mapt ◴[] No.44448715[source]
What happens if you ask this 8gb model "Compose a realistic Wikipedia-style page on the Pokemon named Charizard"?

How close does it come?

4. pcrh ◴[] No.44448846[source]
Wikipedia itself describes its size as ~25GB without media [0]. And it's probably more accurate and with broader coverage in multiple languages compared to the LLM downloaded by the GP.

https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia

replies(1): >>44449192 #
5. pessimizer ◴[] No.44449192[source]
Really? I'd assume that an LLM would deduplicate Wikipedia into something much smaller than 25GB. That's its only job.
replies(1): >>44449683 #
6. inopinatus ◴[] No.44449620[source]
this is a much more deserving and reliable candidate for any labels regarding the breadth of human knowledge.
replies(1): >>44450238 #
7. crazygringo ◴[] No.44449683{3}[source]
> That's its only job.

The vast, vast majority of LLM knowledge is not found in Wikipedia. It is definitely not its only job.

replies(1): >>44449902 #
8. Tostino ◴[] No.44449902{4}[source]
When trained on next word prediction with the standard loss function, by definition it is it's only job.
9. wahnfrieden ◴[] No.44450238{3}[source]
it barely touches the surface
replies(1): >>44460003 #
10. inopinatus ◴[] No.44460003{4}[source]
regarding depth, not breadth, certainly