(github.com)

173 points galeos | 1 comments | 18 Oct 24 09:10 UTC | HN request time: 0.233s | source

Show context

wwwtyro ◴[18 Oct 24 15:03 UTC] No.41880073[source]▶

Can anyone help me understand how this works without special bitnet precision-specific hardware? Is special hardware unnecessary? Maybe it just doesn't reach the full bitnet potential without it? Or maybe it does, with some fancy tricks? Thanks!

replies(3): >>41880204 #>>41880283 #>>41881707 #

summerlight ◴[18 Oct 24 17:40 UTC] No.41881707[source]▶

>>41880073 #

The major benefit would be its significant decrease in memory consumption, rather than the compute itself. The major bottleneck of the current LLM infra is typically memory bandwidth and that's the reason why those chip industries are going crazy on HBM. Surely compute optimization helps but this is useful even without any hardware changes.

replies(1): >>41882331 #

1. az226 ◴[18 Oct 24 18:49 UTC] No.41882331[source]▶

>>41881707 #

Inference speeds go brrrr as well.

↑

Microsoft BitNet: inference framework for 1-bit LLMs