AI engineers claim new algorithm reduces AI power consumption by 95%

(www.tomshardware.com)

370 points ferriswil | 1 comments | 19 Oct 24 18:03 UTC | HN request time: 0s | source

Show context

djoldman ◴[19 Oct 24 19:09 UTC] No.41889903[source]▶

ABSTRACT

Large neural networks spend most computation on floating point tensor multiplications. In this work, we find that a floating point multiplier can be approximated by one integer adder with high precision. We propose the linear-complexity multiplication (L-Mul) algorithm that approximates floating point number multiplication with integer addition operations. The new algorithm costs significantly less computation resource than 8-bit floating point multiplication but achieves higher precision. Compared to 8-bit floating point multiplications, the proposed method achieves higher precision but consumes significantly less bit-level computation. Since multiplying floating point numbers requires substantially higher energy compared to integer addition operations, applying the L-Mul operation in tensor processing hardware can potentially reduce 95% energy cost by elementwise floating point tensor multiplications and 80% energy cost of dot products. We calculated the theoretical error expectation of L-Mul, and evaluated the algorithm on a wide range of textual, visual, and symbolic tasks, including natural language understanding, structural reasoning, mathematics, and commonsense question answering. Our numerical analysis experiments agree with the theoretical error estimation, which indicates that L-Mul with 4-bit mantissa achieves comparable precision as float8 e4m3 multiplications, and L-Mul with 3-bit mantissa outperforms float8 e5m2. Evaluation results on popular benchmarks show that directly applying L-Mul to the attention mechanism is almost lossless. We further show that replacing all floating point multiplications with 3-bit mantissa L-Mul in a transformer model achieves equivalent precision as using float8 e4m3 as accumulation precision in both fine-tuning and inference.

replies(4): >>41890324 #>>41892025 #>>41901112 #>>41921796 #

onlyrealcuzzo ◴[19 Oct 24 20:00 UTC] No.41890324[source]▶

>>41889903 #

Does this mean you can train efficiently without GPUs?

Presumably there will be a lot of interest.

replies(2): >>41890353 #>>41901656 #

crazygringo ◴[19 Oct 24 20:05 UTC] No.41890353[source]▶

>>41890324 #

No. But it does potentially mean that either current or future-tweaked GPUs could run a lot more efficiently -- meaning much faster or with much less energy consumption.

You still need the GPU parallelism though.

replies(2): >>41890621 #>>41893598 #

fuzzfactor ◴[19 Oct 24 20:40 UTC] No.41890621[source]▶

>>41890353 #

I had a feeling it had to be something like massive waste due to a misguided feature of the algorithms that shouldn't have been there in the first place.

Once the "math is done" quite likely it would have paid off better than most investments for the top people to have spent a few short years working with grossly underpowered hardware until they could come up with amazing results there before scaling up. Rather than grossly overpowered hardware before there was even deep understanding of the underlying processes.

When you think about it, what we have seen from the latest ultra-high-powered "thinking" machines is truly so impressive. But if you are trying to fool somebody into believing that it's a real person it's still not "quite" there.

Maybe a good benchmark would be to take a regular PC, and without reliance on AI just pull out all the stops and put all the effort into fakery itself. No holds barred, any trick you can think of. See what the electronics is capable of this way. There are some smart engineers, this would only take a few years but looks like it would have been a lot more affordable.

Then with the same hardware if an AI alternative is not as convincing, something has got to be wrong.

It's good to find out this type of thing before you go overboard.

Regardless of speed or power, I never could have gotten an 8-bit computer to match the output of a 32-bit floating-point algorithm by using floating-point myself. Integers all the way and place the decimal where it's supposed to be when you're done.

Once it's really figured out, how do you think it would feel being the one paying the electric bills up until now?

replies(5): >>41890824 #>>41891053 #>>41892039 #>>41892366 #>>41895079 #

jimmaswell ◴[19 Oct 24 21:12 UTC] No.41890824[source]▶

>>41890621 #

Faster progress was absolutely worth it. Spending years agonizing over theory to save a bit of electric would have been a massive disservice to the world.

replies(2): >>41890834 #>>41891732 #

rossjudson ◴[19 Oct 24 23:48 UTC] No.41891732[source]▶

>>41890824 #

You're sort of presuming that LLMs are going to be a massive service to the world there, aren't you? I think the jury is still out on that one.

replies(1): >>41891845 #

jimmaswell ◴[20 Oct 24 00:08 UTC] No.41891845[source]▶

>>41891732 #

They already have been. Even just in programming, even just Copilot has been a life changing productivity booster.

replies(5): >>41891944 #>>41892396 #>>41892532 #>>41894732 #>>41900955 #

recursive ◴[20 Oct 24 02:54 UTC] No.41892532[source]▶

>>41891845 #

I've been using copilot for several months. If I could figure out a way to measure its impact on my productivity, I'd probably see a single digit percentage boost in "productivity". This is not life-changing for me. And for some tasks, it's actually worse than nothing. As in, I spend time feeding it a task, and it just completely fails to do anything useful.

replies(3): >>41892918 #>>41895485 #>>41903092 #

jimmaswell ◴[20 Oct 24 04:43 UTC] No.41892918[source]▶

>>41892532 #

I've been using it for over a year I think. I don't often feed it tasks with comments so much as go about things the same as usual and let it autocomplete. The time and cognitive load saved adds up massively. I've had to go without it for a bit while my workplace gets its license in order for the corporate version and the personal version has an issue with the proxy, and it's been agonizing going without it again. I almost forgot how much it sucks having to jump to google every other minute, and it was easy to start to take for granted how much context copilot was letting me not have to hold onto in my head. It really lets me work on the problem as opposed to being mired in immaterial details. It feels like I'm at least 2x slower overall without it.

replies(2): >>41893474 #>>41893539 #

atq2119 ◴[20 Oct 24 07:14 UTC] No.41893539[source]▶

>>41892918 #

> I almost forgot how much it sucks having to jump to google every other minute

Even allowing for some hyperbole, your programming experience is extremely different from mine. Looking anything up outside the IDE, let alone via Google, is by far the exception for me rather than the rule.

I've long suspected that this kind of difference explains a lot of the difference in how Copilot is perceived.

replies(1): >>41894099 #

namaria ◴[20 Oct 24 09:13 UTC] No.41894099[source]▶

>>41893539 #

Claiming LLMs are a massive boost for coding productivity is becoming a red flag that the claimant has a tenuous grasp on the skills necessary. Yeah if you have to look up everything all the time and you can't tell the AI slop isn't very good, you can put out code quite fast.

replies(5): >>41894349 #>>41895172 #>>41898052 #>>41900076 #>>41903110 #

1. h_tbob ◴[21 Oct 24 01:57 UTC] No.41900076{3}[source]▶

>>41894099 #

Hey, we were all beginners once!

On another note, even if you are experienced it helps when doing new stuff and you don’t know the proper syntax for what you want. For example let’s say your using flutter, you can just type

// bold

And it will help put the proper bold stuff in there.

↑