AI engineers claim new algorithm reduces AI power consumption by 95%

1. kayo_20211030 ◴[19 Oct 24 19:34 UTC] No.41890110[source]▶

Extraordinary claims require extraordinary evidence. Maybe it's possible, but consider that some really smart people, in many different groups, have been working diligently in this space for quite a while; so claims of 95% savings on energy costs _with equivalent performance_ is in the extraordinary category. Of course, we'll see when the tide goes out.

replies(6): >>41890280 #>>41890322 #>>41890352 #>>41890379 #>>41890428 #>>41890702 #

2. vlovich123 ◴[19 Oct 24 19:55 UTC] No.41890280[source]▶

>>41890110 (TP) #

They’ve been working on unrelated problems like structure of the network or how to build networks with better results. There have been people working on improving the efficiency of the low-level math operations and this is the culmination of those groups. Figuring this stuff out isn’t super easy.

3. throwawaymaths ◴[19 Oct 24 20:00 UTC] No.41890322[source]▶

>>41890110 (TP) #

I don't think this claim is extraordinary. Nothing proposed is mathematically impossible or even unlikely, just a pain in the ass to test (lots of retraining, fine tuning etc, and those operations are expensive when you dont have already massively parallel hardware available, otherwise you're ASIC/FPGAing for something with a huge investment risk)

If I could have a SWAG at it I would say a low resolution model like llama-2 would probably be just fine (llama-2 quantizes without too much headache) but a higher resolution model like llama-3 probably not so much, not without massive retraining anyways.

4. Randor ◴[19 Oct 24 20:05 UTC] No.41890352[source]▶

>>41890110 (TP) #

The energy claims up to ~70% can be verified. The inference implementation is here:

https://github.com/microsoft/BitNet

replies(2): >>41890548 #>>41891150 #

5. manquer ◴[19 Oct 24 20:08 UTC] No.41890379[source]▶

>>41890110 (TP) #

It is a click bait headline the claim itself is not extraordinary. the preprint from arxiv was posted here some time back .

The 95% gains is specifically only for multiplication operations, inference is compute light and memory heavy in the first place so the actual gains would be far less smaller .

Tech journalism (all journalism really) can hardly be trusted to publish grounded news with the focus on clicks and revenue they need to survive.

replies(3): >>41890560 #>>41891862 #>>41892643 #

6. kayo_20211030 ◴[19 Oct 24 20:14 UTC] No.41890428[source]▶

>>41890110 (TP) #

re: all above/below comments. It's still an extraordinary claim.

I'm not claiming it's not possible, nor am I claiming that it's not true, or, at least, honest.

But, there will need to be evidence that using real machines, and using real energy an _equivalent performance_ is achievable. A defense that "there are no suitable chips" is a bit disingenuous. If the 95% savings actually has legs some smart chip manufacturer will do the math and make the chips. If it's correct, that chip making firm will make a fortune. If it's not, they won't.

replies(1): >>41891512 #

7. kayo_20211030 ◴[19 Oct 24 20:29 UTC] No.41890548[source]▶

>>41890352 #

I'm not an AI person, in any technical sense. The savings being claimed, and I assume verified, are on ARM and x86 chips. The piece doesn't mention swapping mult to add, and a 1-bit LLM is, well, a 1-bit LLM.

Also,

> Additionally, it reduces energy consumption by 55.4% to 70.0%

With humility, I don't know what that means. It seems like some dubious math with percentages.

replies(2): >>41890656 #>>41891000 #

8. kayo_20211030 ◴[19 Oct 24 20:31 UTC] No.41890560[source]▶

>>41890379 #

Thank you. That makes sense.

9. Randor ◴[19 Oct 24 20:45 UTC] No.41890656{3}[source]▶

>>41890548 #

> I don't know what that means. It seems like some dubious math with percentages.

I would start by downloading a 1.58 model such as: https://huggingface.co/HF1BitLLM/Llama3-8B-1.58-100B-tokens

Run the non-quantized version of the model on your 3090/4090 gpu and observe the power draw. Then load the 1.58 model and observe the power usage. Sure, the numbers have a wide range because there are many gpu/npu to make the comparison.

replies(1): >>41890880 #

10. stefan_ ◴[19 Oct 24 20:53 UTC] No.41890702[source]▶

>>41890110 (TP) #

I mean, all these smart people would rather pay NVIDIA all their money than make AMD viable. And yet they tell us its all MatMul.

replies(2): >>41890897 #>>41893117 #

11. kayo_20211030 ◴[19 Oct 24 21:22 UTC] No.41890880{4}[source]▶

>>41890656 #

Good one!

12. kayo_20211030 ◴[19 Oct 24 21:25 UTC] No.41890897[source]▶

>>41890702 #

Both companies are doing pretty well. Why don't you think AMD is viable?

replies(2): >>41891173 #>>41894341 #

13. sroussey ◴[19 Oct 24 21:45 UTC] No.41891000{3}[source]▶

>>41890548 #

Not every instruction on a CPU or GPU uses the same amount of power. So if you could rewrite your algorithm to use more power efficient instructions (even if you technically use more of them), you can save overall power draw.

That said, time to market has been more important than any cares of efficiency for some time. Now and in the future, there is more of a focus on it as the expenses in equipment and power have really grown.

14. littlestymaar ◴[19 Oct 24 22:08 UTC] No.41891150[source]▶

>>41890352 #

How does the liked article relate to BitNet at all? It's about the “addition is all you need” paper which AFAIK is unrelated.

replies(1): >>41892383 #

15. nelup20 ◴[19 Oct 24 22:10 UTC] No.41891173{3}[source]▶

>>41890897 #

AMD's ROCm just isn't there yet compared to Nvidia's CUDA. I tried it on Linux with my AMD GPU and couldn't get things working. AFAIK on Windows it's even worse.

replies(1): >>41894656 #

16. throwawaymaths ◴[19 Oct 24 23:09 UTC] No.41891512[source]▶

>>41890428 #

> If the 95% savings actually has legs some smart chip manufacturer will do the math and make the chips

Terrible logic. By a similar logic we wouldn't be using python for machine learning at all, for example (or x86 for compute). Yet here we are.

replies(1): >>41895168 #

17. rob_c ◴[20 Oct 24 00:14 UTC] No.41891862[source]▶

>>41890379 #

Bingo,

We have a winner. Glad that came from someone not in my lectures on ML network design

Honestly, thanks for beeting me to this comment

18. Randor ◴[20 Oct 24 02:12 UTC] No.41892383{3}[source]▶

>>41891150 #

Yeah, I get what you're saying but both are challenging the current MatMul methods. The L-Mul paper claims "a power savings of 95%" and that is the thread topic. Bitnet proves that at least 70% is possible by getting rid of MatMul.

19. ksec ◴[20 Oct 24 03:26 UTC] No.41892643[source]▶

>>41890379 #

>Tech journalism (all journalism really) can hardly be trusted to publish grounded news with the focus on clicks and revenue they need to survive.

Right now the only way to gain real knowledge is actually to read comments of those articles.

20. dotnet00 ◴[20 Oct 24 05:34 UTC] No.41893117[source]▶

>>41890702 #

It's not their job to make AMD viable, it's AMD's job to make AMD viable. NVIDIA didn't get their position for free, they spent a decade refining CUDA and its tooling before GPU-based crypto and AI kicked off.

21. redleader55 ◴[20 Oct 24 10:13 UTC] No.41894341{3}[source]▶

>>41890897 #

The litimus test would be if you read in the news that Amazon, Microsoft, Google or Meta just bought billions in GPUs from AMD.

They are and have been buying AMD CPUs for a while now, which says something about AMD and Intel.

replies(1): >>41894463 #

22. JonChesterfield ◴[20 Oct 24 10:42 UTC] No.41894463{4}[source]▶

>>41894341 #

Microsoft and Meta are running customer facing LLM workloads on AMD's graphics cards. Oracle seems to like them too. Google is doing the TPU thing with Broadcom and Amazon seems to have decided to bet on Intel (in a presumably fatal move but time will tell). We'll find some more information on the order book in a couple of weeks at earnings.

I like that the narrative has changed from "AI only runs on Cuda" to "sure it runs fine on AMD if you must"

23. mattalex ◴[20 Oct 24 11:31 UTC] No.41894656{4}[source]▶

>>41891173 #

That entirely depends on what AMD device you look at: gaming GPUs are not well supported, but their instinct line of accelerators works just as well as cuda. keep in mind that, in contrast to Nvidia, AMD uses different architectures for compute and gaming (though they are changing that in the next generation)

24. kayo_20211030 ◴[20 Oct 24 13:14 UTC] No.41895168{3}[source]▶

>>41891512 #

What's wrong with the logic? A caveat in the paper is that the technique will save 95% energy but that the technique will not run efficiently on current chips. I'm saying that if the new technique needs new chips and saves 95% of energy costs with the same performance, someone will make the chips. I say nothing about how and why we do ML as we do today - the 100% energy usage level.

replies(1): >>41905432 #

25. throwawaymaths ◴[21 Oct 24 15:51 UTC] No.41905432{4}[source]▶

>>41895168 #

It's Terrible logic because it doesn't take into account the way this industry works. We don't do things because they are better. We do things because we can convince investors, because it's hirable, because we don't want to learn something new, because we're afraid our built up knowledge base is going to become obsolete, so we pull more people into our technical debt, etc.