SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators

(machinelearning.apple.com)

171 points pizza | 1 comments | 06 Apr 25 08:53 UTC | HN request time: 0s | source

Show context

visarga ◴[06 Apr 25 10:35 UTC] No.43600416[source]▶

Very interesting trick, using a dictionary of basis vectors which are quickly computed from a seed without storage. But the result is the same 3 or 4 bit quantization, with only a slight improvement. Their tiles are small, just 8 or 12 weights, it's why compression doesn't go too far. It would have been great if this trick lowered quantization <1 bit/weight, that would require longer tiles. Wondering what are the limits if we use a larger reservoir of cheap entropy as part of neural net architecture, even in training.

Congrats to Apple and Meta, makes sense they did the research, this will go towards efficient serving of LLMs on phones. And it's very easy to implement.

replies(2): >>43600451 #>>43601616 #

kingsleyopara ◴[06 Apr 25 10:47 UTC] No.43600451[source]▶

>>43600416 #

I was about to post something similar. While the research is interesting, it doesn’t offer any advantages over 3- or 4-bit quantization. I also have to assume they explored using longer tiles but found it to be ineffective — which would make sense to me from an information theory perspective.

replies(3): >>43600460 #>>43601409 #>>43603433 #

1. hedgehog ◴[06 Apr 25 17:59 UTC] No.43603433[source]▶

>>43600451 #

This technique has three significant advantages over popular low bit quantization: 1) it retains more accuracy, 2) it does not require calibration data, 3) it's easier to implement in hardware.

↑