←back to thread

262 points rain1 | 2 comments | | HN request time: 0.55s | source
Show context
ljoshua ◴[] No.44443222[source]
Less a technical comment and more just a mind-blown comment, but I still can’t get over just how much data is compressed into and available in these downloadable models. Yesterday I was on a plane with no WiFi, but had gemma3:12b downloaded through Ollama. Was playing around with it and showing my kids, and we fired history questions at it, questions about recent video games, and some animal fact questions. It wasn’t perfect, but holy cow the breadth of information that is embedded in an 8.1 GB file is incredible! Lossy, sure, but a pretty amazing way of compressing all of human knowledge into something incredibly contained.
replies(22): >>44443263 #>>44443274 #>>44443296 #>>44443751 #>>44443781 #>>44443840 #>>44443976 #>>44444227 #>>44444418 #>>44444471 #>>44445299 #>>44445966 #>>44446013 #>>44446775 #>>44447373 #>>44448218 #>>44448315 #>>44448452 #>>44448810 #>>44449169 #>>44449182 #>>44449585 #
rain1 ◴[] No.44443274[source]
It's extremely interesting how powerful a language model is at compression.

When you train it to be an assistant model, it's better at compressing assistant transcripts than it is general text.

There is an eval which I have a lot of interested in and respect for https://huggingface.co/spaces/Jellyfish042/UncheatableEval called UncheatableEval, which tests how good of a language model an LLM is by applying it on a range of compression tasks.

This task is essentially impossible to 'cheat'. Compression is a benchmark you cannot game!

replies(2): >>44443457 #>>44444415 #
MPSimmons ◴[] No.44443457[source]
Agreed. It's basically lossy compression for everything it's ever read. And the quantization impacts the lossiness, but since a lot of text is super fluffy, we tend not to notice as much as we would when we, say, listen to music that has been compressed in a lossy way.
replies(2): >>44446147 #>>44450147 #
1. arcticbull ◴[] No.44450147[source]
I've been referring to LLMs as JPEG for all the world's data, and people have really started to come around to it. Initially most folks tended to outright reject this comparison.
replies(1): >>44450341 #
2. simonw ◴[] No.44450341[source]
Ted Chiang wrote a great piece about that: https://www.newyorker.com/tech/annals-of-technology/chatgpt-...

I think it's a solid description for a raw model, but it's less applicable once you start combining an LLM with better context and tools.

What's interesting to me isn't the stuff the LLM "knows" - it's how well an LLM system can serve me when combined with RAG and tools like web search and access to a compiler.

The most interesting developments right now are models like Gemma 3n which are designed to have as much capability as possible without needing a huge amount of "facts" baked into them.