Beautiful branchless binary search

1. abainbridge ◴[28 Apr 23 09:59 UTC] No.35739367[source]▶

How does the benchmarking work here? I always find this kind of micro-benchmarking hard. I feel like I want to see results with and without a preceding cache flush. And with/without clearing of the branch predictor state. Other things I find hard are: 1) ensuring that the CPU is running at full(ish) speed and isn't in a slower-clocked power saving mode for some of the test, 2) effects of code and data alignment can be significant - I want to measure a bunch of different alignments.

Does gtest (that the author used) help with these things? Does anything?

replies(2): >>35739444 #>>35743578 #

2. krona ◴[28 Apr 23 10:13 UTC] No.35739444[source]▶

>>35739367 (TP) #

Running within a linux cset shield is a fairly standard practice.

For benchmarks reporting times in the range of nanoseconds a common approach is a linear regression of varying batch sizes; I'm not sure gtest does this.

But generally, don't trust any result without a (non-parametric) confidence interval, since the confounding factors like OS jitter, CPU frequency, temperature etc. can't be easily controlled, although some CPU features can be disabled.

3. kccqzy ◴[28 Apr 23 16:29 UTC] No.35743578[source]▶

>>35739367 (TP) #

According to Intel, for accurate benchmarking you should write a Linux kernel module. And remember to disable preemption and disable interrupts.

https://www.intel.com/content/dam/www/public/us/en/documents...