←back to thread

170 points judicious | 1 comments | | HN request time: 0.216s | source
Show context
MontyCarloHall ◴[] No.45406403[source]
This great video [0] demonstrates how CPU performance has only increased 1.5-2x over the past 15(!) years when executing extremely branchy code. Really shows you just how deep modern CPU pipelines have become.

The video also showcases a much more impressive branchless speedup: computing CRC checksums. Doing this naïvely with an if-statement for each bit is ~10x slower than doing it branchless with a single bitwise operation for each bit. The author of the article should consider showcasing this too, since it's a lot more impressive than the measly 1.2x speedup highlighted in the article. I assume the minimal/nonexistent speedups in the article are due to modern CPU branch prediction being quite good. But branch predictors inherently fail miserably at CRC because the conditional is on whether the input bit is 1 or 0, which is essentially random.

[0] https://www.youtube.com/watch?v=m7PVZixO35c

replies(2): >>45406999 #>>45407758 #
1. hinkley ◴[] No.45407758[source]
Branchless is also useful for cryptographic transforms as it frustrates timing attacks. And that’s a situation where it only needs to be relatively fast compared to the branching alternative because we are trying to improve security while limiting the overhead of doing so.