An LLM is a lossy encyclopedia

There are a lot of parallels between AI and compression.

In fact the best compression algorithms and LLMs have in common that they work by predicting the next word. Compression algorithms take an extra step called entropy coding to encode the difference between the prediction and the actual data efficiently, and the better the prediction, the better the compression ratio.

What makes a LLM "lossy" is that you don't have the "encode the difference" step.

And yes, it means you can turn a LLM into a (lossless) compression algorithm, and I think a really good one in term of compression ratio on huge data sets. You can also turn a compression algorithm like gzip into a language model! A very terrible one, but the output is better than a random stream of bytes.