←back to thread

166 points galeos | 1 comments | | HN request time: 0s | source
Show context
wwwtyro ◴[] No.41880073[source]
Can anyone help me understand how this works without special bitnet precision-specific hardware? Is special hardware unnecessary? Maybe it just doesn't reach the full bitnet potential without it? Or maybe it does, with some fancy tricks? Thanks!
replies(3): >>41880204 #>>41880283 #>>41881707 #
hansvm ◴[] No.41880204[source]
I haven't checked this one out yet, but a common trick is using combinations of instructions and data invariants allowing you to work in "lanes".

The easiest example is xor, which can trivially be interpreted as either xoring one large integer or xoring a vector of smaller integers.

Take a look at the SWAR example here [0] as a pretty common/easy example of that technique being good for something in the real world.

Dedicated hardware is almost always better, but you can still get major improvements with a little elbow grease.

[0] https://nimrod.blog/posts/algorithms-behind-popcount/

replies(1): >>41880274 #
1. 15155 ◴[] No.41880274[source]
This is extremely easy to implement in-FPGA.