←back to thread

386 points ingve | 5 comments | | HN request time: 0.001s | source
Show context
rowanG077 ◴[] No.35737974[source]
Nice algorithm. I don't agree that it is branchless however. Cmov is a branching instruction for sure. And it doesn't really matter whether a branch can be well predicted. The reason you mainly want branchless code is for security reasons where a timing sidechannel could reveal information. Calling this algorithm branchless devalues the term into something meaningless.

Edit: Everybody in the comments is focusing on performance. For performance sensitive code the point is not that it's branchless, the point is that it is fast, that some branchless code is faster than branching code is an implementation detail. During encryption the point is that it certain codepaths MUST be branchless.

replies(8): >>35738013 #>>35738026 #>>35738051 #>>35738053 #>>35738063 #>>35738216 #>>35738547 #>>35738591 #
berkut ◴[] No.35738013[source]
> "The reason you mainly want branchless code is for security reasons where a timing sidechannel could reveal information."

Erm, that likely depends on the industry you're in?

In HPC, branchless algorithms are often needed to use wide architectures (SIMD / GPUs) to their full capacity...

replies(1): >>35738370 #
1. rowanG077 ◴[] No.35738370[source]
It's a requirement thing. For security the requirement is code needs to be branchless. In HPC the requirement is high performance, the branchless algorithm is an implementation detail. In fact branchless code is often much slower then the branching version.
replies(1): >>35739012 #
2. Aissen ◴[] No.35739012[source]
You don't really want branchless in security. The requirement is constant-time and side-channel resistance. The branchless algorithm is an implementation detail.
replies(1): >>35739313 #
3. rowanG077 ◴[] No.35739313[source]
Constant-time is not possible with branching code. If it's possible your code runs through different paths then your time is not constant. Or if you disagree with me show me branching ASM code that has exactly the same perf counters independent of input.
replies(1): >>35739601 #
4. leroy-is-here ◴[] No.35739601{3}[source]
It’s possible to design an algorithm which branches and is also constant time. If, for all inputs, the instruction length (# of instructions executed) is the same even for different branch paths, then they must all run in the same time. This, of course, isn’t accounting for different (microscopic) execution times of individual instructions, but constant time isn’t really a measure of clock time anyway.
replies(1): >>35740492 #
5. rowanG077 ◴[] No.35740492{4}[source]
If your CPU is as simple as the one designed during a one quarter year uni class then sure. But that's simply not a realistic scenario with a modern CPU. Even if you have the same amount of instructions then branch prediction, memory latencies, super scalar execution and more will throw a wrench in that quickly. Even if you have the same instruction stream with different input data the latencies can be different. That is exactly what you are preventing with branchless code. It's been proven that it's possible to use this as an attack vector.

See a trivial example here: https://stackoverflow.com/questions/11227809/why-is-processi...

Where first massaging the data so the processing is branch predictor friendly results in a 6 time speedup. These things are not microscopic.