←back to thread

Multiplatform Matrix Multiplication Kernels

(burn.dev)

85 points homarp | 1 comments | 18 Jul 25 19:59 UTC | HN request time: 0.207s | source

Show context

Lerc ◴[18 Jul 25 22:58 UTC] No.44610720[source]▶

>>44609137 (OP) #

Has there been much research into slightly flawed matrix multiplications?

If you have a measure of correctness, and a measure of performance. Is there a maximum value of correctness per some unit of processing that exists below a full matrix multiply

Obviously it can be done with precision, since that is what floating point is. But is there anything where you can save x% of computation and have fewer than x% incorrect values in a matrix multiplications?

Gradient descent wouldn't really care about a few (Reliably) dud values.

replies(4): >>44610899 #>>44614746 #>>44614820 #>>44617249 #

1. kolinko ◴[19 Jul 25 17:05 UTC] No.44617249[source]▶

I did research on vector-matrix last year:

https://kolinko.github.io/effort/

For semi-random weights you cam get down to 20-30% multiplications/mem reads and maintain ~0.98 cosine similarity output between the approximated and full result.

As far as LLM inference goes, the speedup from removing multiplications is at best comparable to the speedup of quantisation (that is - you get at best similar KL divergence score whether you remove calculations or quantise).