←back to thread

183 points vortex_ape | 7 comments | | HN request time: 1.25s | source | bottom
1. xeeeeeeeeeeenu ◴[] No.42742731[source]
> So on x86_64 processors, we have to branch to say “a 32-bit zero value has 32 leading zeros”.

Not if you're targeting x86-64-v3 or higher. Haswell (Intel) and Piledriver (AMD) introduced the LZCNT instruction that doesn't have this problem.

replies(3): >>42742835 #>>42742859 #>>42749499 #
2. sltkr ◴[] No.42742835[source]
You can also very trivially do (codepoint | 1).leading_zeros(), then you can also shave one byte off the LEN table. (This doesn't affect the result because LEN[32] == LEN[33] == 1).
3. pklausler ◴[] No.42742859[source]
Easy to count leading zeroes in a branch-free manner without a hardware instruction using a conditional move and a de Bruijn sequence; see https://github.com/llvm/llvm-project/blob/main/flang/include... .
replies(1): >>42744263 #
4. hinkley ◴[] No.42744263[source]

    x |= x >> 1;
    x |= x >> 2;
    x |= x >> 4;
    x |= x >> 8;
    x |= x >> 16;
    x |= x >> 32;
Isn't there another way to do this without so many data races?

I feel like this should be

   x |= x >> 1 | x >> ??? ...
replies(1): >>42744484 #
5. gpderetta ◴[] No.42744484{3}[source]
By data races I assume you actually mean data dependencies?
replies(1): >>42750139 #
6. adrian_b ◴[] No.42749499[source]
Piledriver (2012) has introduced TZCNT. LZCNT had already been supported by all AMD CPUs since Barcelona (2007).

Moreover, LZCNT is the more important instruction, because it can replace TZCNT with the addition of a couple of instructions, without using any branches, even in the more rare cases when TZCNT is desired instead of LZCNT. Some older software continues to use the "ffs" gcc intrinsic, which corresponds to TZCNT, even if it is trivial to rewrite it to use LZCNT instead. In old gcc, "ffs" was available instead of the more useful "LZCNT", because the ancient DEC VAX computers included a "FFS" (find first set bit, starting from the LSB) instruction and the gcc "ffs" mapped directly to the DEC VAX instruction.

Haswell (2013) has added both LZCNT and TZCNT, being the first Intel CPU which supports them.

Unfortunately, even if Haswell is more than a decade old, the older Intel Atom CPUs until Tremont did not provide support for these instructions, so they have become ubiquitous only since 2021, even if all non-Atom CPUs have supported them for more than a decade (up to 18 years for AMD).

7. hinkley ◴[] No.42750139{4}[source]
Oops. Yes.