←back to thread

331 points giuliomagnifico | 5 comments | | HN request time: 0.636s | source
Show context
ndiddy ◴[] No.45377533[source]
Fun fact: Bob Colwell (chief architect of the Pentium Pro through Pentium 4) recently revealed that the Pentium 4 had its own 64-bit extension to x86 that would have beaten AMD64 to market by several years, but management forced him to disable it because they were worried that it would cannibalize IA64 sales.

> Intel’s Pentium 4 had our own internal version of x86–64. But you could not use it: we were forced to “fuse it off”, meaning that even though the functionality was in there, it could not be exercised by a user. This was a marketing decision by Intel — they believed, probably rightly, that bringing out a new 64-bit feature in the x86 would be perceived as betting against their own native-64-bit Itanium, and might well severely damage Itanium’s chances. I was told, not once, but twice, that if I “didn’t stop yammering about the need to go 64-bits in x86 I’d be fired on the spot” and was directly ordered to take out that 64-bit stuff.

https://www.quora.com/How-was-AMD-able-to-beat-Intel-in-deli...

replies(11): >>45377674 #>>45377914 #>>45378427 #>>45378583 #>>45380663 #>>45382171 #>>45384182 #>>45385968 #>>45388594 #>>45389629 #>>45391228 #
kimixa ◴[] No.45380663[source]
That's no guarantee it would succeed though - AMD64 also cleaned up a number of warts on the x86 architecture, like more registers.

While I suspect the Intel equivalent would do similar things, simply from being a big enough break it's an obvious thing to do, there's no guarantee it wouldn't be worse than AMD64. But I guess it could also be "better" from a retrospective perspective.

And also remember at the time the Pentium 4 was very much struggling to get the advertised performance. One could argue that one of the major reasons that the AMD64 ISA took off is that the devices that first supported it were (generally) superior even in 32-bit mode.

EDIT: And I'm surprised it got as far as silicon. AMD64 was "announced" and the spec released before the pentium 4 was even released, over 3 years before the first AMD implementations could be purchased. I guess Intel thought they didn't "need" to be public about it? And the AMD64 extensions cost a rather non-trivial amount of silicon and engineering effort to implement - did the plan for Itanium change late enough in the P4 design that it couldn't be removed? Or perhaps this all implies it was a much less far-reaching (And so less costly) design?

replies(5): >>45381174 #>>45381211 #>>45384598 #>>45385380 #>>45386422 #
1. chasil ◴[] No.45381174[source]
The times that I have used "gcc -S" on my code, I have never seen the additional registers used.

I understand that r8-r15 require a REX prefix, which is hostile to code density.

I've never done it with -O2. Maybe that would surprise me.

replies(3): >>45381498 #>>45381833 #>>45387856 #
2. astrange ◴[] No.45381498[source]
You should be able to see it. REX prefixes cost a lot less than register spills do.

If you mean literally `gcc -S`, -O0 is worse than not optimized and basically keeps everything in memory to make it easier to debug. -Os is the one with readable sensible asm.

replies(1): >>45381519 #
3. chasil ◴[] No.45381519[source]
Thanks, I'll give it a try.
4. o11c ◴[] No.45381833[source]
Obviously it depends on how many live variables there are at any point. A lot of nasty loops have relatively few non-memory operands involved, especially without inlining (though even without inlining, the ability to control ABI-mandated spills better will help).

But it's guaranteed to use `r8` and `r9` for for a function that takes 5 and 6 integer arguments (including unpacked 128-bit structs as 2 arguments), or 3 and 4 arguments (not sure about unpacking) for Microsoft. And `r10` is used if you make a system call on Linux.

5. wat10000 ◴[] No.45387856[source]
I don't have gcc handy, but this bit of code pretty easily gets clang to use several of them:

    int f(int **x) {
        int *a = x[0]; int *b = x[1]; int *c = x[2]; int *d = x[3];
        puts("hello");
        return *a + *b + *c + *d;
    }