Beautiful branchless binary search

1. rowanG077 ◴[28 Apr 23 06:02 UTC] No.35737974[source]▶

Nice algorithm. I don't agree that it is branchless however. Cmov is a branching instruction for sure. And it doesn't really matter whether a branch can be well predicted. The reason you mainly want branchless code is for security reasons where a timing sidechannel could reveal information. Calling this algorithm branchless devalues the term into something meaningless.

Edit: Everybody in the comments is focusing on performance. For performance sensitive code the point is not that it's branchless, the point is that it is fast, that some branchless code is faster than branching code is an implementation detail. During encryption the point is that it certain codepaths MUST be branchless.

replies(8): >>35738013 #>>35738026 #>>35738051 #>>35738053 #>>35738063 #>>35738216 #>>35738547 #>>35738591 #

2. berkut ◴[28 Apr 23 06:08 UTC] No.35738013[source]▶

>>35737974 (TP) #

> "The reason you mainly want branchless code is for security reasons where a timing sidechannel could reveal information."

Erm, that likely depends on the industry you're in?

In HPC, branchless algorithms are often needed to use wide architectures (SIMD / GPUs) to their full capacity...

replies(1): >>35738370 #

3. vkazanov ◴[28 Apr 23 06:10 UTC] No.35738026[source]▶

>>35737974 (TP) #

Branchless algos (especially in the case where there truly are no explicit or implicit branches) is just as useful for performance purposes as it is for security. Having a branchless algo at hand is like being halfway to data-level parallelism either through SIMD or GPU.

4. hmry ◴[28 Apr 23 06:16 UTC] No.35738051[source]▶

>>35737974 (TP) #

cmov doesn't go through the branch predictor, and AFAIK it even loads both operands unconditionally every time before checking the condition. I see no reason to consider it a branch, and have never heard anybody else categorize it as one.

5. secondcoming ◴[28 Apr 23 06:17 UTC] No.35738053[source]▶

>>35737974 (TP) #

> The reason you mainly want branchless code is for security reasons

That just isn’t true.

replies(1): >>35738128 #

6. 2102922286 ◴[28 Apr 23 06:19 UTC] No.35738063[source]▶

>>35737974 (TP) #

cmov (i.e. conditional move) doesn't branch. By branch here, we mean "at point at which the successor program counter value might be one of two locations." It's true that this _does_ have security implications, however this is frequently used as a term of art by performance engineers (see https://www.youtube.com/watch?v=g-WPhYREFjk for example).

> And it doesn't really matter whether a branch can be well predicted.

I'm not so sure this is true. See https://lemire.me/blog/2019/10/15/mispredicted-branches-can-... as an example.

A properly predicted branch can be _faster_ than a cmov instruction, however the important point is avoiding the branch mispredict. The fact that we're using cmov is beside the point. We could achieve a similar effect by performing a subtraction and doing bitwise arithmetic to extract the sign bit (which, in effect, performs a comparison).

7. latency-guy2 ◴[28 Apr 23 06:29 UTC] No.35738128[source]▶

>>35738053 #

Which part isn't true? I think the qualifier 'mainly' definitely weakens the statement, but I'm struggling to think of how it could certainly be false statement.

This is certainly true for at minimum DDoS attacks, and I would absolutely consider that a security risk.

replies(2): >>35738144 #>>35738371 #

8. 2102922286 ◴[28 Apr 23 06:31 UTC] No.35738144{3}[source]▶

>>35738128 #

When implementing cryptographic primitives, you want to avoid branching on secret values. The reason why is that the CPU's branch predictor will attempt to predict the value that you're branching on, and thus something about the values that you're branching on gets revealed if you can see how long a CPU takes to run a task/perform a function.

This is more than just a theoretical issue. These channel attacks have been demonstrated in practice, even if the victim CPU is running across the internet.

9. cmovq ◴[28 Apr 23 06:42 UTC] No.35738216[source]▶

>>35737974 (TP) #

> And it doesn't really matter whether a branch can be well predicted

I guess we build branch predictors on our CPUs for fun?

10. rowanG077 ◴[28 Apr 23 07:05 UTC] No.35738370[source]▶

>>35738013 #

It's a requirement thing. For security the requirement is code needs to be branchless. In HPC the requirement is high performance, the branchless algorithm is an implementation detail. In fact branchless code is often much slower then the branching version.

replies(1): >>35739012 #

11. secondcoming ◴[28 Apr 23 07:05 UTC] No.35738371{3}[source]▶

>>35738128 #

Branchless algos are all over the place outside of security.

replies(1): >>35738392 #

12. latency-guy2 ◴[28 Apr 23 07:08 UTC] No.35738392{4}[source]▶

>>35738371 #

Don't care, there's a quote in the OP that this discussion is being limited to.

13. Dwedit ◴[28 Apr 23 07:33 UTC] No.35738547[source]▶

>>35737974 (TP) #

I just had it pointed out to me that CMOV can be simulated with subtraction and bitwise operators.

Specifically, subtraction, arithmetic shift right, and AND. `((A-B) ASR 31) AND C`. Your result is C if `A - B < 0`, and your result is 0 if `A - B >= 0`.

replies(1): >>35738958 #

14. frankreyes ◴[28 Apr 23 07:40 UTC] No.35738591[source]▶

>>35737974 (TP) #

Because in the context of security you associate "branchless" with the time it takes is always the same regardless of the data/values, in order to mitigate a timing attack.

In this context, branchless means something different: the performance of the CPU is maximized for the given hardware, by not flushing the speculative execution of the CPU.

15. spc476 ◴[28 Apr 23 08:44 UTC] No.35738958[source]▶

>>35738547 #

In C, shifting signed integers is undefined behavior.

replies(1): >>35744227 #

16. Aissen ◴[28 Apr 23 08:55 UTC] No.35739012{3}[source]▶

>>35738370 #

You don't really want branchless in security. The requirement is constant-time and side-channel resistance. The branchless algorithm is an implementation detail.

replies(1): >>35739313 #

17. rowanG077 ◴[28 Apr 23 09:50 UTC] No.35739313{4}[source]▶

>>35739012 #

Constant-time is not possible with branching code. If it's possible your code runs through different paths then your time is not constant. Or if you disagree with me show me branching ASM code that has exactly the same perf counters independent of input.

replies(1): >>35739601 #

18. leroy-is-here ◴[28 Apr 23 10:39 UTC] No.35739601{5}[source]▶

>>35739313 #

It’s possible to design an algorithm which branches and is also constant time. If, for all inputs, the instruction length (# of instructions executed) is the same even for different branch paths, then they must all run in the same time. This, of course, isn’t accounting for different (microscopic) execution times of individual instructions, but constant time isn’t really a measure of clock time anyway.

replies(1): >>35740492 #

19. rowanG077 ◴[28 Apr 23 12:31 UTC] No.35740492{6}[source]▶

>>35739601 #

If your CPU is as simple as the one designed during a one quarter year uni class then sure. But that's simply not a realistic scenario with a modern CPU. Even if you have the same amount of instructions then branch prediction, memory latencies, super scalar execution and more will throw a wrench in that quickly. Even if you have the same instruction stream with different input data the latencies can be different. That is exactly what you are preventing with branchless code. It's been proven that it's possible to use this as an attack vector.

See a trivial example here: https://stackoverflow.com/questions/11227809/why-is-processi...

Where first massaging the data so the processing is branch predictor friendly results in a 6 time speedup. These things are not microscopic.

20. compiler-guy ◴[28 Apr 23 17:14 UTC] No.35744227{3}[source]▶

>>35738958 #

Although it can bite you if you don’t know what you are doing, signed-integer shifting is not necessarily undefined behavior. Roughly, for positive signed integers, if the operation doesn’t overflow then the result is defined. For negative signed numbers, right shifts are implementation defined, but almost all modern systems define the result in the expected manner.

https://en.cppreference.com/w/cpp/language/operator_arithmet...

replies(1): >>35744838 #

21. kps ◴[28 Apr 23 18:00 UTC] No.35744838{4}[source]▶

>>35744227 #

Oh, it's too bad C2x doesn't seem to be following C++20 here, since they're apparently mandating two's complement.