SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators

Weights in neural networks don't always need to be precise. Not all weights are equally useful to the network. There seems to be a lot of redundancy that can be replaced with approximations.

This technique seems a bit similar to lossy image compression that replaces exact pixels with a combination of pre-defined patterns (DCT in JPEG), but here the patterns aren't from cosine function, but from a pseudo-random one.

It may also be beating simple quantization from just adding noise that acts as dithering, and breaks up the bands created by combinations of quantized numbers.