←back to thread

103 points vortex_ape | 5 comments | | HN request time: 0s | source
1. xeeeeeeeeeeenu ◴[] No.42742731[source]
> So on x86_64 processors, we have to branch to say “a 32-bit zero value has 32 leading zeros”.

Not if you're targeting x86-64-v3 or higher. Haswell (Intel) and Piledriver (AMD) introduced the LZCNT instruction that doesn't have this problem.

replies(2): >>42742835 #>>42742859 #
2. sltkr ◴[] No.42742835[source]
You can also very trivially do (codepoint | 1).leading_zeros(), then you can also shave one byte off the LEN table. (This doesn't affect the result because LEN[32] == LEN[33] == 1).
3. pklausler ◴[] No.42742859[source]
Easy to count leading zeroes in a branch-free manner without a hardware instruction using a conditional move and a de Bruijn sequence; see https://github.com/llvm/llvm-project/blob/main/flang/include... .
replies(1): >>42744263 #
4. hinkley ◴[] No.42744263[source]

    x |= x >> 1;
    x |= x >> 2;
    x |= x >> 4;
    x |= x >> 8;
    x |= x >> 16;
    x |= x >> 32;
Isn't there another way to do this without so many data races?

I feel like this should be

   x |= x >> 1 | x >> ??? ...
replies(1): >>42744484 #
5. gpderetta ◴[] No.42744484{3}[source]
By data races I assume you actually mean data dependencies?