Fastest Branchless Binary Search

I implemented a variation of this in ZFS’ btree implementation a few years ago:

https://github.com/openzfs/zfs/commit/677c6f8457943fe5b56d7a...

It only works if the comparator is inlined, which is why I had to use a macro and duplicate the code everywhere. C++’s templates effectively do the same thing.

By the way, I notice that C++ favors comparators returning a boolean while C favors comparators returning -1, 0 or 1. Let me share this simple macro from ZFS:

#define TREE_CMP(a, b) (((a) > (b)) - ((a) < (b)))

Use that to generate -1, 0 or 1. Then if you want a boolean, compare the output to < 0 to get the boolean that a C++ comparator would return and as long as the comparator is inlined, the compiler will automatically simplify, such that you get only 1 comparison.

The C++ STL designers who opted for a boolean return value from comparators did premature optimization, since they had no reason to break with the C convention under C++’s "the compiler will inline and optimize" philosophy. The neat thing about the C way is that if you want to generate a boolean from the comparator, you have a choice for the boolean’s semantics (< 0 or <= 0) while C++ made that choice for you.

That said, I have never needed the latter and as a much younger developer, I thought that the C++ way was superior, but time has caused me to conclude the C guys got this right.

Finally, since I am on topic of comparators, there is a fairly famous example of how to generate -1, 0 or 1 when comparing against 0:

https://www.cs.cornell.edu/courses/cs6120/2020fa/blog/super-...

I only mention it since it is interesting and I like sharing interesting things. Reimplementing the signum function from there with TREE_CMP() does not generate code as good as the superoptimized example in godbolt:

https://godbolt.org/z/1j8nWjnqP

It is probably an optimization opportunity for GCC.