←back to thread

166 points galeos | 1 comments | | HN request time: 0.199s | source
Show context
newfocogi ◴[] No.41879780[source]
I'm enthusiastic about BitNet and the potential of low-bit LLMs - the papers show impressive perplexity scores matching full-precision models while drastically reducing compute and memory requirements. What's puzzling is we're not seeing any major providers announce plans to leverage this for their flagship models, despite the clear efficiency gains that could theoretically enable much larger architectures. I suspect there might be some hidden engineering challenges around specialized hardware requirements or training stability that aren't fully captured in the academic results, but would love insights from anyone closer to production deployment of these techniques.
replies(6): >>41879903 #>>41880200 #>>41880375 #>>41881054 #>>41881230 #>>41882202 #
1. danielmarkbruce ◴[] No.41881230[source]
People are almost certainly working on it. The people who are actually serious and think about things like this are less likely to just spout out "WE ARE BUILDING A CHIP OPTIMIZED FOR 1-BIT" or "WE ARE TRAINING A MODEL USING 1-BIT" etc, before actually being quite sure they can make it work at the required scale. It's still pretty researchy.