How large are large language models?

(gist.github.com)

262 points rain1 | 5 comments | 02 Jul 25 10:39 UTC | HN request time: 0s | source

Show context

ljoshua ◴[02 Jul 25 13:00 UTC] No.44443222[source]▶

>>44442072 (OP) #

Less a technical comment and more just a mind-blown comment, but I still can’t get over just how much data is compressed into and available in these downloadable models. Yesterday I was on a plane with no WiFi, but had gemma3:12b downloaded through Ollama. Was playing around with it and showing my kids, and we fired history questions at it, questions about recent video games, and some animal fact questions. It wasn’t perfect, but holy cow the breadth of information that is embedded in an 8.1 GB file is incredible! Lossy, sure, but a pretty amazing way of compressing all of human knowledge into something incredibly contained.

replies(22): >>44443263 #>>44443274 #>>44443296 #>>44443751 #>>44443781 #>>44443840 #>>44443976 #>>44444227 #>>44444418 #>>44444471 #>>44445299 #>>44445966 #>>44446013 #>>44446775 #>>44447373 #>>44448218 #>>44448315 #>>44448452 #>>44448810 #>>44449169 #>>44449182 #>>44449585 #

Workaccount2 ◴[02 Jul 25 13:53 UTC] No.44443751[source]▶

>>44443222 #

I don't like the term "compression" used with transformers because it gives the wrong idea about how they function. Like that they are a search tool glued onto a .zip file, your prompts are just fancy search queries, and hallucinations are just bugs in the recall algo.

Although strictly speaking they have lots of information in a small package, they are F-tier compression algorithms because the loss is bad, unpredictable, and undetectable (i.e. a human has to check it). You would almost never use a transformer in place of any other compression algorithm for typical data compression uses.

replies(2): >>44443792 #>>44443846 #

Wowfunhappy ◴[02 Jul 25 13:56 UTC] No.44443792[source]▶

>>44443751 #

A .zip is lossless compression. But we also have plenty of lossy compression algorithms. We've just never been able to use lossy compression on text.

replies(2): >>44443983 #>>44452808 #

Workaccount2 ◴[02 Jul 25 14:12 UTC] No.44443983{3}[source]▶

>>44443792 #

>We've just never been able to use lossy compression on text.

...and we still can't. If your lawyer sent you your case files in the form of an LLM trained on those files, would you be comfortable with that? Where is the situation you would compress text with an LLM over a standard compression algo? (Other than to make an LLM).

Other lossy compression targets known superfluous information. MP3 removes sounds we can't really hear, and JPEG works by grouping uniform color pixels into single chunks of color.

LLM's kind of do their own thing, and the data you get back out of them is correct, incorrect, or dangerously incorrect (i.e. is plausible enough to be taken as correct), with no algorithmic way to discern which is which.

So while yes, they do compress data and you can measure it, the output of this "compression algorithm" puts in it the same family as a "randomly delete words and thesaurus long words into short words" compression algorithms. Which I don't think anyone would consider to compress their documents.

replies(3): >>44444215 #>>44445387 #>>44446919 #

esafak ◴[02 Jul 25 14:31 UTC] No.44444215{4}[source]▶

>>44443983 #

People summarize (compress) documents with LLMs all day. With legalese the application would be to summarize it in layman's terms, while retaining the original for legal purposes.

replies(1): >>44444361 #

1. Workaccount2 ◴[02 Jul 25 14:41 UTC] No.44444361{5}[source]▶

>>44444215 #

Yes, and we all know (ask teachers) how reliable those summaries are. They are randomly lossy, which makes them unsuitable for any serious work.

I'm not arguing that LLMs don't compress data, I am arguing that they are technically compression tools, but not colloquially compression tools, and the overlap they have with colloquial compression tools is almost zero.

replies(3): >>44444507 #>>44444666 #>>44444708 #

2. menaerus ◴[02 Jul 25 14:53 UTC] No.44444507[source]▶

>>44444361 (TP) #

At this moment LLMs are used for much of the serious work across the globe so perhaps you will need to readjust your line of thinking. There's nothing inherently better or more trustworthy to have a person compile some knowledge than, let's say, a computer algorithm in this case. I place my bets on the latter to have better output.

3. esafak ◴[02 Jul 25 15:06 UTC] No.44444666[source]▶

>>44444361 (TP) #

> They are randomly lossy, which makes them unsuitable for any serious work.

Ask ten people and they'll give ten different summaries. Are humans unsuitable too?

replies(1): >>44445316 #

4. Wowfunhappy ◴[02 Jul 25 15:10 UTC] No.44444708[source]▶

>>44444361 (TP) #

But lossy compression algorithms for e.g. movies and music are also non-deterministic.

I'm not making an argument about whether the compression is good or useful, just like I don't find 144p bitrate starved videos particularly useful. But it doesn't seem so unlike other types of compression to me.

5. Workaccount2 ◴[02 Jul 25 15:58 UTC] No.44445316[source]▶

>>44444666 #

Yes, which is why we write things down, and when those archives become too big we use lossless compression on them, because we cannot tolerate a compression tool that drops the street address of a customer or even worse, hallucinates a slightly different one.

↑