←back to thread

85 points homarp | 1 comments | | HN request time: 0.207s | source
Show context
Lerc ◴[] No.44610720[source]
Has there been much research into slightly flawed matrix multiplications?

If you have a measure of correctness, and a measure of performance. Is there a maximum value of correctness per some unit of processing that exists below a full matrix multiply

Obviously it can be done with precision, since that is what floating point is. But is there anything where you can save x% of computation and have fewer than x% incorrect values in a matrix multiplications?

Gradient descent wouldn't really care about a few (Reliably) dud values.

replies(4): >>44610899 #>>44614746 #>>44614820 #>>44617249 #
1. kolinko ◴[] No.44617249[source]
I did research on vector-matrix last year:

https://kolinko.github.io/effort/

For semi-random weights you cam get down to 20-30% multiplications/mem reads and maintain ~0.98 cosine similarity output between the approximated and full result.

As far as LLM inference goes, the speedup from removing multiplications is at best comparable to the speedup of quantisation (that is - you get at best similar KL divergence score whether you remove calculations or quantise).