←back to thread

111 points galeos | 2 comments | | HN request time: 0s | source
Show context
Havoc ◴[] No.43715393[source]
Is there a reason why the 1.58 ones are always aimed at quite small ones? Think I’ve seen an 8B but that’s about it.

Is there a technical reason for it or just research convenience ?

replies(2): >>43715453 #>>43717231 #
1. yieldcrv ◴[] No.43717231[source]
They aren’t, there is a 1.58 version of deepseek that’s like 200gb instead of 700
replies(1): >>43719355 #
2. logicchains ◴[] No.43719355[source]
That's not a real BitNet, it's just a post-training quantisation, and its performance suffers compared to if it was trained from scratch at 1.58 bits.