Most active commenters
  • rowanG077(4)

←back to thread

386 points ingve | 21 comments | | HN request time: 1.645s | source | bottom
1. rowanG077 ◴[] No.35737974[source]
Nice algorithm. I don't agree that it is branchless however. Cmov is a branching instruction for sure. And it doesn't really matter whether a branch can be well predicted. The reason you mainly want branchless code is for security reasons where a timing sidechannel could reveal information. Calling this algorithm branchless devalues the term into something meaningless.

Edit: Everybody in the comments is focusing on performance. For performance sensitive code the point is not that it's branchless, the point is that it is fast, that some branchless code is faster than branching code is an implementation detail. During encryption the point is that it certain codepaths MUST be branchless.

replies(8): >>35738013 #>>35738026 #>>35738051 #>>35738053 #>>35738063 #>>35738216 #>>35738547 #>>35738591 #
2. berkut ◴[] No.35738013[source]
> "The reason you mainly want branchless code is for security reasons where a timing sidechannel could reveal information."

Erm, that likely depends on the industry you're in?

In HPC, branchless algorithms are often needed to use wide architectures (SIMD / GPUs) to their full capacity...

replies(1): >>35738370 #
3. vkazanov ◴[] No.35738026[source]
Branchless algos (especially in the case where there truly are no explicit or implicit branches) is just as useful for performance purposes as it is for security. Having a branchless algo at hand is like being halfway to data-level parallelism either through SIMD or GPU.
4. hmry ◴[] No.35738051[source]
cmov doesn't go through the branch predictor, and AFAIK it even loads both operands unconditionally every time before checking the condition. I see no reason to consider it a branch, and have never heard anybody else categorize it as one.
5. secondcoming ◴[] No.35738053[source]
> The reason you mainly want branchless code is for security reasons

That just isn’t true.

replies(1): >>35738128 #
6. 2102922286 ◴[] No.35738063[source]
cmov (i.e. conditional move) doesn't branch. By branch here, we mean "at point at which the successor program counter value might be one of two locations." It's true that this _does_ have security implications, however this is frequently used as a term of art by performance engineers (see https://www.youtube.com/watch?v=g-WPhYREFjk for example).

> And it doesn't really matter whether a branch can be well predicted.

I'm not so sure this is true. See https://lemire.me/blog/2019/10/15/mispredicted-branches-can-... as an example.

A properly predicted branch can be _faster_ than a cmov instruction, however the important point is avoiding the branch mispredict. The fact that we're using cmov is beside the point. We could achieve a similar effect by performing a subtraction and doing bitwise arithmetic to extract the sign bit (which, in effect, performs a comparison).

7. latency-guy2 ◴[] No.35738128[source]
Which part isn't true? I think the qualifier 'mainly' definitely weakens the statement, but I'm struggling to think of how it could certainly be false statement.

This is certainly true for at minimum DDoS attacks, and I would absolutely consider that a security risk.

replies(2): >>35738144 #>>35738371 #
8. 2102922286 ◴[] No.35738144{3}[source]
When implementing cryptographic primitives, you want to avoid branching on secret values. The reason why is that the CPU's branch predictor will attempt to predict the value that you're branching on, and thus something about the values that you're branching on gets revealed if you can see how long a CPU takes to run a task/perform a function.

This is more than just a theoretical issue. These channel attacks have been demonstrated in practice, even if the victim CPU is running across the internet.

9. cmovq ◴[] No.35738216[source]
> And it doesn't really matter whether a branch can be well predicted

I guess we build branch predictors on our CPUs for fun?

10. rowanG077 ◴[] No.35738370[source]
It's a requirement thing. For security the requirement is code needs to be branchless. In HPC the requirement is high performance, the branchless algorithm is an implementation detail. In fact branchless code is often much slower then the branching version.
replies(1): >>35739012 #
11. secondcoming ◴[] No.35738371{3}[source]
Branchless algos are all over the place outside of security.
replies(1): >>35738392 #
12. latency-guy2 ◴[] No.35738392{4}[source]
Don't care, there's a quote in the OP that this discussion is being limited to.
13. Dwedit ◴[] No.35738547[source]
I just had it pointed out to me that CMOV can be simulated with subtraction and bitwise operators.

Specifically, subtraction, arithmetic shift right, and AND. `((A-B) ASR 31) AND C`. Your result is C if `A - B < 0`, and your result is 0 if `A - B >= 0`.

replies(1): >>35738958 #
14. frankreyes ◴[] No.35738591[source]
Because in the context of security you associate "branchless" with the time it takes is always the same regardless of the data/values, in order to mitigate a timing attack.

In this context, branchless means something different: the performance of the CPU is maximized for the given hardware, by not flushing the speculative execution of the CPU.

15. spc476 ◴[] No.35738958[source]
In C, shifting signed integers is undefined behavior.
replies(1): >>35744227 #
16. Aissen ◴[] No.35739012{3}[source]
You don't really want branchless in security. The requirement is constant-time and side-channel resistance. The branchless algorithm is an implementation detail.
replies(1): >>35739313 #
17. rowanG077 ◴[] No.35739313{4}[source]
Constant-time is not possible with branching code. If it's possible your code runs through different paths then your time is not constant. Or if you disagree with me show me branching ASM code that has exactly the same perf counters independent of input.
replies(1): >>35739601 #
18. leroy-is-here ◴[] No.35739601{5}[source]
It’s possible to design an algorithm which branches and is also constant time. If, for all inputs, the instruction length (# of instructions executed) is the same even for different branch paths, then they must all run in the same time. This, of course, isn’t accounting for different (microscopic) execution times of individual instructions, but constant time isn’t really a measure of clock time anyway.
replies(1): >>35740492 #
19. rowanG077 ◴[] No.35740492{6}[source]
If your CPU is as simple as the one designed during a one quarter year uni class then sure. But that's simply not a realistic scenario with a modern CPU. Even if you have the same amount of instructions then branch prediction, memory latencies, super scalar execution and more will throw a wrench in that quickly. Even if you have the same instruction stream with different input data the latencies can be different. That is exactly what you are preventing with branchless code. It's been proven that it's possible to use this as an attack vector.

See a trivial example here: https://stackoverflow.com/questions/11227809/why-is-processi...

Where first massaging the data so the processing is branch predictor friendly results in a 6 time speedup. These things are not microscopic.

20. compiler-guy ◴[] No.35744227{3}[source]
Although it can bite you if you don’t know what you are doing, signed-integer shifting is not necessarily undefined behavior. Roughly, for positive signed integers, if the operation doesn’t overflow then the result is defined. For negative signed numbers, right shifts are implementation defined, but almost all modern systems define the result in the expected manner.

https://en.cppreference.com/w/cpp/language/operator_arithmet...

replies(1): >>35744838 #
21. kps ◴[] No.35744838{4}[source]
Oh, it's too bad C2x doesn't seem to be following C++20 here, since they're apparently mandating two's complement.