Most active commenters
  • ajross(4)
  • kimixa(3)

←back to thread

331 points giuliomagnifico | 14 comments | | HN request time: 0.001s | source | bottom
Show context
ndiddy ◴[] No.45377533[source]
Fun fact: Bob Colwell (chief architect of the Pentium Pro through Pentium 4) recently revealed that the Pentium 4 had its own 64-bit extension to x86 that would have beaten AMD64 to market by several years, but management forced him to disable it because they were worried that it would cannibalize IA64 sales.

> Intel’s Pentium 4 had our own internal version of x86–64. But you could not use it: we were forced to “fuse it off”, meaning that even though the functionality was in there, it could not be exercised by a user. This was a marketing decision by Intel — they believed, probably rightly, that bringing out a new 64-bit feature in the x86 would be perceived as betting against their own native-64-bit Itanium, and might well severely damage Itanium’s chances. I was told, not once, but twice, that if I “didn’t stop yammering about the need to go 64-bits in x86 I’d be fired on the spot” and was directly ordered to take out that 64-bit stuff.

https://www.quora.com/How-was-AMD-able-to-beat-Intel-in-deli...

replies(11): >>45377674 #>>45377914 #>>45378427 #>>45378583 #>>45380663 #>>45382171 #>>45384182 #>>45385968 #>>45388594 #>>45389629 #>>45391228 #
kimixa ◴[] No.45380663[source]
That's no guarantee it would succeed though - AMD64 also cleaned up a number of warts on the x86 architecture, like more registers.

While I suspect the Intel equivalent would do similar things, simply from being a big enough break it's an obvious thing to do, there's no guarantee it wouldn't be worse than AMD64. But I guess it could also be "better" from a retrospective perspective.

And also remember at the time the Pentium 4 was very much struggling to get the advertised performance. One could argue that one of the major reasons that the AMD64 ISA took off is that the devices that first supported it were (generally) superior even in 32-bit mode.

EDIT: And I'm surprised it got as far as silicon. AMD64 was "announced" and the spec released before the pentium 4 was even released, over 3 years before the first AMD implementations could be purchased. I guess Intel thought they didn't "need" to be public about it? And the AMD64 extensions cost a rather non-trivial amount of silicon and engineering effort to implement - did the plan for Itanium change late enough in the P4 design that it couldn't be removed? Or perhaps this all implies it was a much less far-reaching (And so less costly) design?

replies(5): >>45381174 #>>45381211 #>>45384598 #>>45385380 #>>45386422 #
1. ghaff ◴[] No.45381211[source]
As someone who followed IA64/Itanium pretty closely, it's still not clear to me the degree to which Intel (or at least groups within Intel) thought IA64 was a genuinely better approach and the degree to which Intel (or at least groups within Intel) simply wanted to get out from existing cross-licensing deals with AMD and others. There were certainly also existing constraints imposed by partnerships, notably with Microsoft.
replies(2): >>45381402 #>>45382598 #
2. ajross ◴[] No.45381402[source]
Both are likely true. It's easy to wave it away in hindsight, but there was genuine energy and excitement about the architecture in its early days. And while the first chips were late and on behind-the-cutting-edge processes they were actually very performant (FPU numbers were world-beating, even -- parallel VLIW dispatch really helped here).

Lots of people loved Itanium and wanted to see it succeed. But surely the business folks had their own ideas too.

replies(3): >>45381455 #>>45383151 #>>45383639 #
3. kimixa ◴[] No.45381455[source]
Yes - VLIW seems to lend itself to computation-heavy code, used to this day in many DSP (and arguably GPU, or at least "influences" many GPU) architectures.
4. tw04 ◴[] No.45382598[source]
Given that Itanium originated at HP, it seems unlikely it was about AMD and more about the fact, at the time, Intel was struggling with 64-bit. People are talking about the P4 but Itanium architecture dates back to the late 80s…

https://en.m.wikipedia.org/wiki/Itanium

replies(1): >>45390482 #
5. ccgreg ◴[] No.45383151[source]
> they were actually very performant

Insanely expensive for that performance. I was the architect of HPC clusters in that era, and Itanic never made it to the top for price per performance.

Also, having lived through the software stack issues with the first beta chips of Itanic and AMD64 (and MIPS64, but who's counting), AMD64 was way way more stable than the others.

6. pjmlp ◴[] No.45383639[source]
I am one of those people, and I think that it only failed because AMD had the possible to turn the tables on Intel, to use the article's title.

Without AMD64, I firmly believe eventually Itanium would have been the new world no matter what.

We see this all the time, technology that could be great but fails due to not being pushed hard enough, and other similar technology that does indeed succeed because the creators are willing push it at a loss during several years until it finally becomes the new way.

replies(3): >>45387412 #>>45389086 #>>45403145 #
7. ghaff ◴[] No.45387412{3}[source]
I'm inclined to agree and I've written as much. In a world where 64-bit x86 wasn't really an option, Intel and "the industry" would probably have eventually figured a way to make Itanium work well-enough and cost-effectively-enough and incremented over time. Some of the then-current RISC chips would probably have remained more broadly viable in that timeline but, in the absence of a viable alternative, 64-bit was going to happen and therefore probably Itanium.

Maybe ARM gets a real kick in the pants but high-performance server processors were probably too far in the future to play a meaningful role.

8. Agingcoder ◴[] No.45389086{3}[source]
There was a fundamental difficulty with ‘given a sufficiently smart compiler’ if I remember well revolving around automatic parallelization. You might argue that given enough time and money it might have been solved, but it’s a really hard problem.

( I might have forgotten)

replies(1): >>45389534 #
9. ajross ◴[] No.45389534{4}[source]
The compilers did arrive, but obviously too late. Modern pipeline optimization and register scheduling in gcc & LLVM is wildly more sophisticated than anything people were imagining in 2001.
replies(1): >>45392989 #
10. mwpmaybe ◴[] No.45390482[source]
For context, it was intended to be the successor to PA-RISC and compete with DEC Alpha.
11. kimixa ◴[] No.45392989{5}[source]
But modern CPUs have even more capabilities on re-ordering/OOO execution and other "live" scheduling work. They will always have more information available than a ahead-of-time static scheduling from the compiler, as so much is data dependent. If it wasn't worth it they would be slashing those capabilities instead.

Statically scheduled/in order stuff is still relegated to pretty much microcontroller, or specific numeric workloads. For general computation, it still seems like a poor fit.

replies(1): >>45403960 #
12. thesz ◴[] No.45403145{3}[source]

  > Without AMD64, I firmly believe eventually Itanium would have been the new world no matter what.
VLIW is not binary forward- or cross-implementation-compatible. If MODEL1 has 2 instruction per block and its successor MODEL2 has 4, the code for MODEL1 will be run on MODEL2, but it will underperform due to underutilization. If execution latencies differ between two versions of the same VLIW ISA implementation, the code for one may not be executed optimally on another. Even different memory controllers and cache hierarchies can change optimal VLIW code.

This precludes any VLIW from having multiple differently constrained implementations. You cannot segment VLIW implementations you can do with as x86, ARM, MIPS, PowerPC, etc, where same code will be executed as optimal as possible on the concrete implementation of ISA.

So - no, Itanium (or any other VLIW for that matter) would not be the new world.

replies(1): >>45403947 #
13. ajross ◴[] No.45403947{4}[source]
> VLIW is not binary forward- or cross-implementation-compatible.

It was on IA-64, the bundle format was deliberately chosen to allow for easy extension.

But broadly it's true: you can't have a "pure" VLIW architecture independent of the issue and pipeline architecture of the CPU. Any device with differing runtime architecture is going to have to do some cooking of the instructions to match it to its own backend. But that decode engine is much easier to write when it's starting from a wide format that presents lots of instructions and makes explicit promises about their interdependencies.

14. ajross ◴[] No.45403960{6}[source]
That's true. But if anything that cuts in the opposite direction in the argument: modern CPUs are doing all that optimization in hardware, at runtime. In software it's a no-brainer in comparison.