To me the amazing thing is that you can tell the model to do something, even follow simple instructions in plain English, like make a list or write some python code to do $x, that's the really amazing part.
So text wikipedia at 24G would easily hit 8G with many standard forms of compression, I'd think. If not better. And it would be 100% accurate, full text and data. Far more usable.
It's so easy for people to not realise how massive 8GB really is, in terms of text. Especially if you use ascii instead of UTF.
They host a pretty decent article here: https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia
The relevant bit:
> As of 16 October 2024, the size of the current version including all articles compressed is about 24.05 GB without media.