(arxiv.org)

157 points galeos | 1 comments | 15 Nov 24 14:28 UTC | HN request time: 0.205s | source

Show context

yalok ◴[20 Nov 24 17:54 UTC] No.42196490[source]▶

So basically the idea is to pack 3 ternary weights (-1,0,1) into 5 bits instead of 6, but they compare the results with fp16 model which would use 48 bits for those 3 weights…

And speed up comes from the memory io, compensated a bit by the need to unpack these weights before using them…

Did I get this right?

replies(1): >>42197301 #

UncleOxidant ◴[20 Nov 24 19:32 UTC] No.42197301[source]▶

>>42196490 #

Yeah, that seems to be the case. Though, I suspect Microsoft is interested in implementing something like a custom RISC-V CPU that has an ALU that's tuned for doing this ternary math and added custom vector/matrix instructions. Something like that could save them a lot of power in their data centers.

If it were to catch on then perhaps we'd see Intel, AMD, ARM adding math ops optimized for doing ternary math?

replies(1): >>42200664 #

1. yalok ◴[21 Nov 24 03:00 UTC] No.42200664[source]▶

>>42197301 #

my dream is to see ternary support at the HW wire level - that'd be even more power efficient, and transistor count may be less...

↑

1-Bit AI Infrastructure