(probablydance.com)

386 points ingve | 2 comments | 28 Apr 23 05:46 UTC | HN request time: 0.641s | source

Show context

cmovq ◴[28 Apr 23 06:31 UTC] No.35738147[source]▶

> Those spikes for std::lower_bound are on powers of two, where it is somehow much slower. I looked into it a little bit but can’t come up with an easy explanation

It’s been a long time since I looked into this, if I recall correctly binary search with close to power of 2 sized data is slower because of some interaction with the caches which causes the data to constantly have to be reloaded from memory.

replies(2): >>35738297 #>>35738577 #

1. coppsilgold ◴[28 Apr 23 06:53 UTC] No.35738297[source]▶

>>35738147 #

Cache associativity, conflict miss. When too many memory addresses which are far apart map to the same cache lines while being used and reused in quick succession, causing cache evictions.

Scott Meyers had a very good talk which contained this information: <https://youtu.be/WDIkqP4JbkE?t=3614>

This is an effect of access pattern rather than size, but some access patterns are dictated by size.

replies(1): >>35740021 #

2. jove_ ◴[28 Apr 23 11:35 UTC] No.35740021[source]▶

>>35738297 (TP) #

Size also matters for this problem. The effect of it is to slash your processor's cache size. If your problem is small enough that it fits in the cache anyway you won't see the effects. That said, unless the cache is hot, it shouldn't make a difference in the first place. Moreover I'm pretty confident in guessing that with a cold cache, the difference between the algorithms would vanish.

↑

Beautiful branchless binary search