←back to thread

170 points judicious | 4 comments | | HN request time: 0.746s | source
1. MontyCarloHall ◴[] No.45406403[source]
This great video [0] demonstrates how CPU performance has only increased 1.5-2x over the past 15(!) years when executing extremely branchy code. Really shows you just how deep modern CPU pipelines have become.

The video also showcases a much more impressive branchless speedup: computing CRC checksums. Doing this naïvely with an if-statement for each bit is ~10x slower than doing it branchless with a single bitwise operation for each bit. The author of the article should consider showcasing this too, since it's a lot more impressive than the measly 1.2x speedup highlighted in the article. I assume the minimal/nonexistent speedups in the article are due to modern CPU branch prediction being quite good. But branch predictors inherently fail miserably at CRC because the conditional is on whether the input bit is 1 or 0, which is essentially random.

[0] https://www.youtube.com/watch?v=m7PVZixO35c

replies(2): >>45406999 #>>45407758 #
2. SynasterBeiter ◴[] No.45406999[source]
The linked video doesn't take into account power consumption of these CPUs. He seems to be comparing laptop CPUs, NUC CPUs and desktop CPUs. If you compare a 100W CPU and a 30W CPU that's a couple of years newer, you shouldn't be surprised there isn't much of a difference in performance.
replies(1): >>45407096 #
3. MontyCarloHall ◴[] No.45407096[source]
Even if you exclude the three mobile CPUs in the charts (the 2012 i5, the 2015 i7, and the 2023 i9 NUC), the results still hold.

>If you compare a 100W CPU and a 30W CPU that's a couple of years newer, you shouldn't be surprised there isn't much of a difference in performance

Sure, but this is over a lot more than a couple years. I'd expect a 2023 mobile i9 to be considerably more than twice as fast as a 2007 desktop Core 2 Duo.

4. hinkley ◴[] No.45407758[source]
Branchless is also useful for cryptographic transforms as it frustrates timing attacks. And that’s a situation where it only needs to be relatively fast compared to the branching alternative because we are trying to improve security while limiting the overhead of doing so.