←back to thread

170 points judicious | 1 comments | | HN request time: 0s | source
Show context
vdupras ◴[] No.45407294[source]
In the part about "abs", there's an assembly breakdown:

mov eax, edi

sar eax, 31

mov ecx, eax

add edi, ecx

xor eax, edi

Has this been generated by a C compiler? If yes, it's a bit puzzling, because can't you remove "mov ecx, eax", replace "add edi, ecx" by "add edi, eax" and have the exact same result?

replies(2): >>45407475 #>>45409238 #
userbinator ◴[] No.45407475[source]
If you look at compiler output, you will always see plenty of small stupidities.
replies(1): >>45407566 #
vdupras ◴[] No.45407566[source]
So why does conventional wisdom say that compilers will, in the vast majority of the time, outperform programmers doing assembly by hand? It seems contradictory to me.
replies(1): >>45409122 #
toast0 ◴[] No.45409122[source]
Because the cost of that unnecessary mov is very small, so the win from human assembly is very small.

But rules of thumb are like this. If you know enough to question the rule of thumb, go ahead. Hand assembly in hot code can be worth the cost.

It's also possible the value in ecx is used again outside the snippet?

replies(1): >>45409181 #
vdupras ◴[] No.45409181[source]
In that context, it's not very small, it's 20% (all instructions are register-to-register instructions, so they all have the same weight). It's huge.

Yes, there's the possibility that ecx is used elsewhere, and in that case, my second comment is irrelevant, because I was answering to the possibility that such big wart is to be expected from compilers because they crop up regularly.

But then again, it's unlikely that it's used elsewhere, because eax has the return value of the C snippet, there's nothing else to do, the function can return. So the original question remains: did this come from a C compiler? If yes, it's crappy code.

replies(1): >>45409270 #
toast0 ◴[] No.45409270[source]
> In that context, it's not very small, it's 20% (all instructions are register-to-register instructions, so they all have the same weight). It's huge.

Huge in space sure. Not in execution time.

replies(1): >>45409850 #
vdupras ◴[] No.45409850[source]
It's 20% of the execution time. All these instructions use the same number of cycles.
replies(1): >>45410274 #
toast0 ◴[] No.45410274[source]
Do they? I put together two quick and dirty nonsense test programs this is option2:

   int main (void) {
       for (int i = 0; i < 1000000000; ++i) {
            asm volatile (
                ".intel_syntax\n"
                "mov eax, edi\n"
                "sar eax, 31\n"
                "add edi, eax\n"
                "xor eax, edi\n"
            :::);
       }
       return 0;
   }
option1 has the extraneous mov ecx, eax, and then add with ecx.

I confirmed with objdump -d that the assembly hadn't been touched and that the loops were the same. On my otherwise mostly idle dual L5640 system and pinned to a single cpu (just in case), option1 consistently runs in 3.14 seconds and option2 consistently runs in 3.15 seconds.

Adding an extra zero, both option1 and option2 runs in 30.94-30.95 user seconds. The extraneous move doesn't seem to cost any actual time.

replies(1): >>45410578 #
userbinator ◴[] No.45410578[source]
Microbenchmarks don't usually tell the whole story. Once the bloat adds up the cache misses and macro-scale benchmarks will show a difference.
replies(1): >>45414532 #
1. toast0 ◴[] No.45414532{3}[source]
I'm sure the size penalty adds up in some cases.

But if you look at your program that must go faster, and you see unnecessary moves in the hot section(s), go ahead and remove them, but don't be surprised if it doesn't change much.

If you went and did your whole program by hand, the debloating might also not change much. That's why there's a rule of thumb.

If you have the skill to make a change to the compiler so it can output a better sequence of instructions, I suspect thsat's pretty difficult, but it may make enough of a difference over a large number of programs to be worthwhile.