Beautiful branchless binary search

(probablydance.com)

386 points ingve | 1 comments | 28 Apr 23 05:46 UTC | HN request time: 0.382s | source

Show context

2102922286 ◴[28 Apr 23 06:24 UTC] No.35738100[source]▶

A cool related algorithm is https://algorithmica.org/en/eytzinger

In addition to being branchless, it also has better cache properties than a standard binary search tree.

If you're doing a binary search in an array, you start in the middle, and then jump all the way to the midway point of one half, and so on. As a result, there's a lot of distance between each read that you do in the array. Thus, each read is putting extra strain on the cache (or the reads won't be cached).

The CPU cache performs much better if the values you want to read are close to each other. Enter the Eytzinger Binary Search. The idea is that, if you're always going to be accessing the root of the tree, and then one of its two children--you should just put those children physically close to the root, so they'll all be on the same cache line!

replies(8): >>35738287 #>>35738451 #>>35738561 #>>35739155 #>>35740049 #>>35740397 #>>35740690 #>>35744101 #

srcreigh ◴[28 Apr 23 07:17 UTC] No.35738451[source]▶

>>35738100 #

Trying to figure this out myself. So cache line is 64 bytes. Ignoring pointers you can fit eight 8bit data/key values, or sixteen 4bit data/key values. Pointers are entirely implicit (thx user below)

This would save you 3 or 4 memory reads respectively.

The odds of this strategy helping past the first few layers seems unlikely. So for 100k elements binary tree, this should be a 21% or 31% performance improvement respectively. That’s 17 vs 14 and 17 vs 13 reads.

And this would not help much for string data values unless the strings are at most 8 characters long :-)

This same thinking can be used to optimize database queries. Given several table and index layouts how many 8kb disk reads will be needed in each case. Wish I could get a job doing that kind of design work!

replies(3): >>35738474 #>>35738646 #>>35745053 #

vidarh ◴[28 Apr 23 07:48 UTC] No.35738646[source]▶

>>35738451 #

You don't store data nodes, or pointers. You store the keys. If you need to store data, just store them at the same index in another array.

The pointers are wholly implicit. E.g. assuming you let the index 0 be empty, the left pointer for a node k is at 2k and the right node is at 2k+1.

In terms of benefit here, the main benefit appears to be that since you always know where the children will be, you can trade bandwidth for latency and trigger pre-fetching of the next step down before you've even decided if you're going left or right.

replies(3): >>35738793 #>>35738949 #>>35750668 #

1. rocqua ◴[29 Apr 23 06:51 UTC] No.35750668[source]▶

>>35738646 #

How does it help pre-fetching?

↑