(arxiv.org)

111 points galeos | 2 comments | 17 Apr 25 07:27 UTC | HN request time: 0.001s | source

Show context

Havoc ◴[17 Apr 25 11:46 UTC] No.43715393[source]▶

>>43714004 (OP) #

Is there a reason why the 1.58 ones are always aimed at quite small ones? Think I’ve seen an 8B but that’s about it.

Is there a technical reason for it or just research convenience ?

replies(2): >>43715453 #>>43717231 #

1. yieldcrv ◴[17 Apr 25 14:18 UTC] No.43717231[source]▶

>>43715393 #

They aren’t, there is a 1.58 version of deepseek that’s like 200gb instead of 700

replies(1): >>43719355 #

2. logicchains ◴[17 Apr 25 16:48 UTC] No.43719355[source]▶

>>43717231 (TP) #

That's not a real BitNet, it's just a post-training quantisation, and its performance suffers compared to if it was trained from scratch at 1.58 bits.

↑

BitNet b1.58 2B4T Technical Report