Does anyone have a collection of these things?
Does anyone have a collection of these things?
https://graphics.stanford.edu/~seander/bithacks.html
It is not on the list, but #define CMP(X, Y) (((X) > (Y)) - ((X) < (Y))) is an efficient way to do generic comparisons for things that want UNIX-style comparators. If you compare the output against 0 to check for some form of greater than, less than or equality, the compiler should automatically simplify it. For example, CMP(X, Y) > 0 is simplified to (X > Y) by a compiler.
The signum(x) function that is equivalent to CMP(X, 0) can be done in 3 or 4 instructions depending on your architecture without any comparison operations:
https://www.cs.cornell.edu/courses/cs6120/2022sp/blog/supero...
It is such a famous example, that compilers probably optimize CMP(X, 0) to that, but I have not checked. Coincidentally, the expansion of CMP(X, 0) is on the bit hacks list.
There are a few more superoptimized mathematical operations listed here:
https://www2.cs.arizona.edu/~collberg/Teaching/553/2011/Reso...
Note that the assembly code appears to be for the Motorola 68000 processor and it makes use of flags that are set in edge cases to work.
Finally, there is a list of helpful macros for bit operations that originated in OpenSolaris (as far as I know) here:
https://github.com/freebsd/freebsd-src/blob/master/sys/cddl/...
There used to be an Open Solaris blog post on them, but Oracle has taken it down.
Enjoy!
I guess this only applies when the compiler knows what version of > you are using?
Eg it might not work in C++ when < and > are overloaded for eg strings?
Even if you only need one of cosf() or sinf(), many CPUs calculate both values at the same time, so taking the other is free. If you only need single precision values, you can do this in double precision to avoid much of the errors you would get by doing this in single precision.
This trick can be used to accelerate the RoPE relative positional encoding calculations used in inference for llama 3 and likely others. I have done this and seen a measurable speed up, although these calculations are such a small part of inference that it was a small improvement.
https://godbolt.org/z/nGbPhz86q
If you did not inline the operator overloads and had them in another compilation unit, do not expect this to simplify (unless you use LTO).
If you have compound comparators in the operator overloads (such that on equality in one field, it considers a second for a tie breaker), I would not expect it to simplify, although the compiler could surprise me.