Nice algorithm. I don't agree that it is branchless however. Cmov is a branching instruction for sure. And it doesn't really matter whether a branch can be well predicted. The reason you mainly want branchless code is for security reasons where a timing sidechannel could reveal information. Calling this algorithm branchless devalues the term into something meaningless.
Edit: Everybody in the comments is focusing on performance. For performance sensitive code the point is not that it's branchless, the point is that it is fast, that some branchless code is faster than branching code is an implementation detail. During encryption the point is that it certain codepaths MUST be branchless.
replies(8):