AI engineers claim new algorithm reduces AI power consumption by 95%

1. remexre ◴[19 Oct 24 18:48 UTC] No.41889747[source]▶

Isn't this just taking advantage of "log(x) + log(y) = log(xy)"? The IEEE754 floating-point representation stores floats as sign, mantissa, and exponent -- ignore the first two (you quantitized anyway, right?), and the exponent is just an integer storing log() of the float.

replies(2): >>41889800 #>>41890236 #

2. convolvatron ◴[19 Oct 24 18:55 UTC] No.41889800[source]▶

>>41889747 (TP) #

yes. and the next question is 'ok, how do we add'

replies(2): >>41889877 #>>41889991 #

3. dietr1ch ◴[19 Oct 24 19:06 UTC] No.41889877[source]▶

>>41889800 #

I guess that if the bulk of the computation goes into the multiplications, you can work in the log-space and simply sum, and when the time comes to actually do a sum on the original space you can go back and sum.

replies(1): >>41890126 #

4. kps ◴[19 Oct 24 19:19 UTC] No.41889991[source]▶

>>41889800 #

Yes. I haven't yet read this paper to see what exactly it says is new, but I've definitely seen log-based representations under development before now. (More log-based than the regular floating-point exponent, that is. I don't actually know the argument behind the exponent-and-mantissa form that's been pretty much universal even before IEEE754, other than that it mimics decimal scientific notation.)

5. a-loup-e ◴[19 Oct 24 19:35 UTC] No.41890126{3}[source]▶

>>41889877 #

Not sure how well that would work if you're often adding bias after every layer

6. mota7 ◴[19 Oct 24 19:49 UTC] No.41890236[source]▶

>>41889747 (TP) #

Not quite: It's taking advantage of (1+a)(1+b) = 1 + a + b + ab. And where a and b are both small-ish, ab is really small and can just be ignored.

So it turns the (1+a)(1+b) into 1+a+b. Which is definitely not the same! But it turns out, machine guessing apparently doesn't care much about the difference.

replies(3): >>41890382 #>>41890513 #>>41892121 #

7. amelius ◴[19 Oct 24 20:08 UTC] No.41890382[source]▶

>>41890236 #

You might then as well replace the multiplication by the addition in the original network. In that case you're not even approximating anything.

Am I missing something?

replies(1): >>41893129 #

8. tommiegannert ◴[19 Oct 24 20:25 UTC] No.41890513[source]▶

>>41890236 #

Plus the 2^-l(m) correction term.

Feels like multiplication shouldn't be needed for convergence, just monotonicity? I wonder how well it would perform if the model was actually trained the same way.

9. dsv3099i ◴[20 Oct 24 01:14 UTC] No.41892121[source]▶

>>41890236 #

This trick is used a ton when doing hand calculation in engineering as well. It can save a lot of work.

You're going to have tolerance on the result anyway, so what's a little more error. :)

10. dotnet00 ◴[20 Oct 24 05:36 UTC] No.41893129{3}[source]▶

>>41890382 #

They're applying that simplification to the exponent bits of an 8 bit float. The range is so small that the approximation to multiplication is going to be pretty close.