←back to thread

386 points ingve | 1 comments | | HN request time: 0.219s | source
Show context
pizza234 ◴[] No.35738299[source]
Some information on the CMOV can be found on the Intel Optimization Reference Manual (https://cdrdv2-public.intel.com/671488/248966-046A-software-...).

Torvalds was famously critical of it (https://yarchive.net/comp/linux/cmov.html); part of the criticism is now moot though, due to low latencies on modern processors (it seems 1 cycle or less, although the instruction consumes internal flags).

His idea seems to be still applicable and consistent with Intel's recommendation: make branches predictable, and only after, use CMOV for the remaining ones. His fundametnal assumption is that "even if you were to know that something is unpredictable, it's going to be very rare.".

replies(3): >>35738345 #>>35741256 #>>35742105 #
2102922286 ◴[] No.35738345[source]
This is one area where Profile-Guided Optimization (PGO) can help a lot! With PGO, you run your program on some sample input and it logs info like how many times each side of a branch was taken. From there, you can recompile your code. If the compiler sees that one side of the branch dominates, it can emit code to prioritize that branch. However if the branch counts are approximately even and the branch is hard to predict (n.b. this is technically distinct from having even branch counts), then the compiler can know that the CPU would have trouble predicting the branch, and can emit a cmov instruction instead.
replies(4): >>35738413 #>>35739365 #>>35740748 #>>35745104 #
josephg ◴[] No.35739365[source]
I’m always surprised by rust’s performance for this reason. The compiler outputs huge binaries, chock full of bounds checks and the like. But performance doesn’t seem to suffer at all from it. On the contrary - I ported some well optimized C to rust and it ran faster.

I can only assume the compiler is marking all the bounds checks as unlikely to fail, and correctly predicted branches must be more or less free in modern CPUs. It’s the only way I can explain it.

replies(3): >>35739914 #>>35748312 #>>35750685 #
1. winstonewert ◴[] No.35748312[source]
Rust's large file sizes are due to relying on static linking, not particularly the extra bounds checking.