mov eax, edi
sar eax, 31
mov ecx, eax
add edi, ecx
xor eax, edi
Has this been generated by a C compiler? If yes, it's a bit puzzling, because can't you remove "mov ecx, eax", replace "add edi, ecx" by "add edi, eax" and have the exact same result?
mov eax, edi
sar eax, 31
mov ecx, eax
add edi, ecx
xor eax, edi
Has this been generated by a C compiler? If yes, it's a bit puzzling, because can't you remove "mov ecx, eax", replace "add edi, ecx" by "add edi, eax" and have the exact same result?
But rules of thumb are like this. If you know enough to question the rule of thumb, go ahead. Hand assembly in hot code can be worth the cost.
It's also possible the value in ecx is used again outside the snippet?
Yes, there's the possibility that ecx is used elsewhere, and in that case, my second comment is irrelevant, because I was answering to the possibility that such big wart is to be expected from compilers because they crop up regularly.
But then again, it's unlikely that it's used elsewhere, because eax has the return value of the C snippet, there's nothing else to do, the function can return. So the original question remains: did this come from a C compiler? If yes, it's crappy code.
Huge in space sure. Not in execution time.
int main (void) {
for (int i = 0; i < 1000000000; ++i) {
asm volatile (
".intel_syntax\n"
"mov eax, edi\n"
"sar eax, 31\n"
"add edi, eax\n"
"xor eax, edi\n"
:::);
}
return 0;
}
option1 has the extraneous mov ecx, eax, and then add with ecx.I confirmed with objdump -d that the assembly hadn't been touched and that the loops were the same. On my otherwise mostly idle dual L5640 system and pinned to a single cpu (just in case), option1 consistently runs in 3.14 seconds and option2 consistently runs in 3.15 seconds.
Adding an extra zero, both option1 and option2 runs in 30.94-30.95 user seconds. The extraneous move doesn't seem to cost any actual time.
But if you look at your program that must go faster, and you see unnecessary moves in the hot section(s), go ahead and remove them, but don't be surprised if it doesn't change much.
If you went and did your whole program by hand, the debloating might also not change much. That's why there's a rule of thumb.
If you have the skill to make a change to the compiler so it can output a better sequence of instructions, I suspect thsat's pretty difficult, but it may make enough of a difference over a large number of programs to be worthwhile.