←back to thread

168 points misternugget | 6 comments | | HN request time: 0.934s | source | bottom
1. camel-cdr ◴[] No.42198301[source]
nth_set_bit_u64: wouldn't that be __builtin_ctzll(_pdep_u64(1<<n, v)) with BMI2?
replies(3): >>42198733 #>>42199867 #>>42200581 #
2. SkiFire13 ◴[] No.42198733[source]
That's assuming you're ok with your program not running on some older cpus.
replies(1): >>42200177 #
3. kwillets ◴[] No.42199867[source]
That's my guess as well.

Bitstring rank/select is a well-known problem, and the BMI and non-BMI (Hacker's Delight) versions are available as a reference.

4. zamadatix ◴[] No.42200177[source]
That and that you're not willing to entertain splitting the manual version as #[cfg(not(target_feature = "bmi2"))] fallback implementation. For something already down to ~ 1 ns both of those may well be very reasonable assumptions of course.
replies(1): >>42206263 #
5. stouset ◴[] No.42200581[source]
I believe the equivalent ARM64 instructions are in SVE2 which isn’t yet supported on Apple’s M-series chips as of M4, sadly.
6. Validark ◴[] No.42206263{3}[source]
AMD machines prior to Zen 3 had a micro-coded implementation of pdep and pext, so they're actually relatively expensive for those earlier Zen machines (as well as Bulldozer). Some people still have Ryzen 3000 series chips.

On the Intel side, pdep has been fast since its release with the Haswell in 2013, so pretty much everyone using Intel should be fine in this regard.