←back to thread

386 points ingve | 1 comments | | HN request time: 0.208s | source
1. usefulcat ◴[] No.35741930[source]
> In fact in Clang branchless_lower_bound is slower than std::lower_bound

That's strange, I've profiled it and I find that branchless_lower_bound is still faster than std::lower_bound using clang14, just not as fast as with gcc12 (on Intel Broadwell). I'm using gcc's libstdc++ in both cases, maybe he was using libc++ with clang?

Edit:

Replacing the contents of the for loop with the following improves performance for clang but reduces performance for gcc:

    const size_t increment[] = { 0, step };
    begin += increment[compare(begin[step], value)];