←back to thread

352 points ferriswil | 1 comments | | HN request time: 0s | source
Show context
kayo_20211030 ◴[] No.41890110[source]
Extraordinary claims require extraordinary evidence. Maybe it's possible, but consider that some really smart people, in many different groups, have been working diligently in this space for quite a while; so claims of 95% savings on energy costs _with equivalent performance_ is in the extraordinary category. Of course, we'll see when the tide goes out.
replies(6): >>41890280 #>>41890322 #>>41890352 #>>41890379 #>>41890428 #>>41890702 #
Randor ◴[] No.41890352[source]
The energy claims up to ~70% can be verified. The inference implementation is here:

https://github.com/microsoft/BitNet

replies(2): >>41890548 #>>41891150 #
kayo_20211030 ◴[] No.41890548[source]
I'm not an AI person, in any technical sense. The savings being claimed, and I assume verified, are on ARM and x86 chips. The piece doesn't mention swapping mult to add, and a 1-bit LLM is, well, a 1-bit LLM.

Also,

> Additionally, it reduces energy consumption by 55.4% to 70.0%

With humility, I don't know what that means. It seems like some dubious math with percentages.

replies(2): >>41890656 #>>41891000 #
Randor ◴[] No.41890656[source]
> I don't know what that means. It seems like some dubious math with percentages.

I would start by downloading a 1.58 model such as: https://huggingface.co/HF1BitLLM/Llama3-8B-1.58-100B-tokens

Run the non-quantized version of the model on your 3090/4090 gpu and observe the power draw. Then load the 1.58 model and observe the power usage. Sure, the numbers have a wide range because there are many gpu/npu to make the comparison.

replies(1): >>41890880 #
1. kayo_20211030 ◴[] No.41890880{3}[source]
Good one!