Most active commenters
  • bombcar(4)
  • jabl(3)
  • p_l(3)

←back to thread

331 points giuliomagnifico | 47 comments | | HN request time: 1.421s | source | bottom
1. bombcar ◴[] No.45377061[source]
Youngsters today don't remember it; x86 was fucking dead according to the press; it really wasn't until Athlon 64 came out (which gave a huge bump to Linux as it was one of the first OSes to fully support it - one of the reasons I went to Gentoo early on was to get that sweet 64 bit compilation!) that everyone started to admit the Itanium was a turd.

The key to the whole thing was that it was a great 32 bit processor; the 64 bit stuff was gravy for many, later.

Apple did something similar with its CPU changes - now three - they only swap when the old software runs better on the new chip even if emulated than it did on the old.

AMD64 was also well thought out; it wasn't just a simple "have two more bytes" slapped on 32 bit. Doubling the number of general purpose registers was noticeable - you took a performance hit going to 64 bit early on because all the memory addresses were wider, but the extra registers usually more than made up for it.

This is also where the NX bit entered.

replies(4): >>45377177 #>>45377584 #>>45377642 #>>45377870 #
2. drob518 ◴[] No.45377177[source]
Itanium wasn’t a turd. It was just not compatible with x86. And that was enough to sink it.
replies(10): >>45377228 #>>45377279 #>>45377290 #>>45377368 #>>45377474 #>>45377560 #>>45377649 #>>45378005 #>>45378555 #>>45379366 #
3. bombcar ◴[] No.45377228[source]
IIRC it didn't even do great against POWER and other bespoke OS/Chip combos, though it did way better there than generic x86.
replies(1): >>45385755 #
4. philipkglass ◴[] No.45377279[source]
I used it for numerical simulations and it was very fast there. But on my workstation many common programs like "grep" were slower than on my cheap Athlon machine. (Both were running Red Hat Linux at the time.) I don't know how much of that was a compiler problem and how much was an architecture problem; the Itanium numerical simulation code was built with Intel's own compiler but all the system utilities were built with GNU compilers.
5. fooker ◴[] No.45377290[source]
>Itanium wasn’t a turd

It required immense multi-year efforts from compiler teams to get passable performance with Itanium. And passable wasn't good enough.

replies(2): >>45377427 #>>45377504 #
6. eej71 ◴[] No.45377368[source]
Itanium was mostly a turd because it pushed so many optimization issues onto the compiler.
replies(2): >>45377786 #>>45388704 #
7. bombcar ◴[] No.45377427{3}[source]
Wasn't the only compiler that produced code worth anything for Itanium the paid one from Intel? I seem to recall complaining about it on the GCC lists.
replies(2): >>45377736 #>>45378374 #
8. textlapse ◴[] No.45377474[source]
I have worked next to an Itanium machine. It sounds like a helicopter - barely able to meet the performance requirements.

We have come a long way from that to arm64 and amd64 as the default.

replies(1): >>45377617 #
9. Joel_Mckay ◴[] No.45377504{3}[source]
The IA-64 architecture had too much granularity of control dropped into software. Thus, reliable compiler designs were much more difficult to build.

It wasn't a bad chip, but like Cell or modern Dojo tiles most people couldn't run it without understanding parallelism and core metastability.

amd64 wasn't initially perfect either, but was accessible for mere mortals. =3

10. cmrdporcupine ◴[] No.45377560[source]
Itanium was pointless when Alpha existed already and was already getting market penetration in the high end market. Intel played disgusting corporate politics to kill it and then push the ugly failed Itanium to market, only to have to panic back to x86_64 later.

I have no idea how/why Intel got a second life after that, but they did. Which is a shame. A sane market would have punished them and we all would have moved on.

replies(5): >>45377872 #>>45377958 #>>45377964 #>>45379589 #>>45381495 #
11. golddust-gecko ◴[] No.45377584[source]
100% -- the conventional wisdom was that the x86 architecture was too riddled with legacy and complexity to improve its performance, and was a dead end.

Itanium never met an exotic computer architecture journal article that it didn't try and incorporate. Initially this was viewed as "wow such amazing VLIW magic will obviously dominate" and subsequently as "this complexity makes it hard to write a good compiler for, and the performance benefit just doesn't justify it."

Intel had to respond to AMD with their "x86-64" copy, though it really didn't want to.

Eventually it became obvious that the amd64/x64/x86-64 chips were going to exceed Itanium in performance, and with the massive momentum of legacy on its side and Itanium was toast.

replies(1): >>45379207 #
12. Joel_Mckay ◴[] No.45377617{3}[source]
The stripped down ARM 8/9 for AArch64 is good for a lot of use-cases, but most of the vendor specific ASIC advanced features were never enabled for reliability reasons.

ARM is certainly better than before, but could have been much better. =3

13. jerf ◴[] No.45377642[source]
If I am remembering correctly, this was also a good time to be in Linux. Since the Linux world operated on source code rather than binary blobs, it was easier to convert software to run 64-bit native. Non-trivial in an age of C, but still much easier than the commercial world. I had a much more native 64-bit system running a couple of years before it was practical in the Windows world.
replies(2): >>45377728 #>>45378335 #
14. hawflakes ◴[] No.45377649[source]
Itanium was compatible with x86. In fact, it booted into x86 mode. Merced, the first implementation had a part of the chip called the IVE, Intel Value Engine, that implemented x86 very slowly.

You would boot in x86 mode and run some code to switch to ia64 mode.

HP saw the end of the road for their solo efforts on PA-RISC and Intel eyed the higher end market against SPARC, MIPS, POWER, and Alpha (hehe. all those caps) so they banded together to tackle the higher end.

But as AMD proved, you could win by scaling up instead of dropping an all-new architecture.

* worked at HP during the HP-Intel Highly Confidential project.

15. wmf ◴[] No.45377728[source]
Linux for Alpha probably deserves some credit for getting everything 64-bit-ready years before x86-64 came out.
replies(1): >>45384736 #
16. hawflakes ◴[] No.45377736{4}[source]
I lost track of it but HP, as co-architects, had its own compiler team working on it. I think SGI also had efforts to target ia64 as well. But the EPIC (Explicitly Parallel Instruction Computing) didn't really catch on. VLIW would need recompilation on each new chip but EPIC promised it would still run.

https://en.wikipedia.org/wiki/Explicitly_parallel_instructio...

replies(2): >>45378119 #>>45380556 #
17. CoastalCoder ◴[] No.45377786{3}[source]
IIRC, wasn't part of the issue that compile-time instruction scheduling was a poor match with speculative execution and/or hardware-based branch prediction?

I.e., the compiler had no access to information that's only revealed at runtime?

replies(1): >>45379932 #
18. jacquesm ◴[] No.45377870[source]
Up until Athlon your best bet for a 64 bit system was a DEC Alpha running RedHat. Amazing levels of performance for a manageable amount of money.
19. dessimus ◴[] No.45377872{3}[source]
> I have no idea how/why Intel got a second life after that, but they did.

For the same reason the line "No one ever got fired for buying IBM." exists. Buying AMD at large companies was seen as a gamble that deciders weren't will to make. Even now, if you just call up your account managers at Dell, HP, or Lenovo asking for servers or PCs, they are going to quote you Intel builds unless you specifically ask. I don't think I've ever been asked by my sales reps if I wanted an Intel or AMD CPU. Just how many slots/cores, etc.

replies(1): >>45378542 #
20. loloquwowndueo ◴[] No.45377958{3}[source]
“Sane market” sounds like an oxymoron, technology markets have multiple failed attempts at doing the sane thing.
21. toast0 ◴[] No.45377964{3}[source]
Historically, when Intel is on their game, they have great products, and better than most support for OEMs and integrators. They're also very effective at marketting and arm twisting.

The arm twisting gets them through rough times like itanium and pentium4 + rambus, etc. I still think they can recover from the 10nm fab problems, even though they're taking their sweet time.

22. kstrauser ◴[] No.45378005[source]
It absolutely was. It was possible, hypothetically, to write a chunk of code that ran very fast. There were any number of very small bits of high-profile code which did this. However, it was impossible to make general-purpose, not-manually-tuned code run fast on it. Itanium placed demands on compiler technology that simple didn't exist, and probably still don't.

Basically, you could write some tuned assembly that would run fast on one specific Itanium CPU release by optimizing for its exact number of execution units, etc. It was not possible to run `./configure && make && make install` for anything not designed with that level of care and end up with a binary that didn't run like frozen molasses.

I had to manage one of these pigs in a build farm. On paper, it should've been one of the more powerful servers we owned. In practice, the Athlon servers were several times faster at any general purpose workloads.

23. fooker ◴[] No.45378119{5}[source]
In the compiler world, these HP compiler folks are leading compiler teams/orgs at ~all the tech companies now, while almost none of the Intel compiler people seem to be around.
replies(1): >>45384790 #
24. MangoToupe ◴[] No.45378335[source]
It also helps that linux had a much better 32-bit compatibility than windows did. Not sure why but it probably has something to do with legacy support windows shed moving to 64-bits.
replies(1): >>45386459 #
25. hajile ◴[] No.45378374{4}[source]
NOTHING produced good code for the original Itanium which is why they switched gears REALLY early on.

Intel first publicly mentioned Poulson all the way back in 2005 just FOUR years after the original chip was launched. Poulson was basically a traditional out-of-order CPU core that even had hyperthreading[0]. They knew really early on that the designs just weren't that good. This shouldn't have been a surprise to Intel as they'd already made a VLIW CPU in the 90s (i860) that failed spectacularly.

[0]https://www.realworldtech.com/poulson/

replies(1): >>45378929 #
26. bombcar ◴[] No.45378542{4}[source]
The Intel chipsets were phenomenally stable; the AMD ones were always plagued by weird issues.
27. jcranmer ◴[] No.45378555[source]
I acquired a copy of the Itanium manuals, and in flicking through it, you can barely get through a page before going "you did WHAT?" over some feature.
replies(1): >>45380212 #
28. speed_spread ◴[] No.45378929{5}[source]
Even the i860 found more usage as a specialized CPU than the Itanium. The original Nextcube had an optional video card that used an i860 dedicated to graphics.
29. Animats ◴[] No.45379207[source]
Back in that era I went to an EE380 talk at Stanford where the people from HP trying to do a compiler for Itanium spoke. It the project wasn't going well at all. Itanium is an explicit-parallelism superscalar machine. The compiler has to figure out what operations to do in parallel. Most superscalar machines do that during execution. Instruction ordering and packing turned out to be a hard numerical optimization problem. The compiler developers sounded very discouraged.

It's amazing that retirement units, the part of a superscalar CPU that puts everything back together as the parallel operations finish, not only work but don't slow things down. The Pentium Pro head designer had about 3,000 engineers working at peak, which indicates how hard this is. But it all worked, and that became the architecture of the future.

This was around the time that RISC was a big thing. Simplify the CPU, let the compiler do the heavy lifting, have lots of registers, make all instructions the same size, and do one instruction per clock. That's pure RISC. Sun's SPARC is an expression of that approach. (So is a CRAY-1, which is a large but simple supercomputer with 64 of everything.) RISC, or something like it, seemed the way to go faster. Hence Itanium. Plus, it had lots of new patented technology, so Intel could finally avoid being cloned.

Superscalars can get more than one instruction per clock, at the cost of insane CPU complexity. Superscalar RISC machines are possible, but they lose the simplicity of RISC. Making all instructions the same size increases the memory bandwidth the CPU needs. That's where RISC lost out over x86 extensions. x86 is a terse notation.

So we ended up with most of the world still running on an instruction set based on the one Harry Pyle designed when he was an undergrad at Case in 1969.

30. Findecanor ◴[] No.45379366[source]
The Itanium had some interesting ideas executed poorly. It was a bloated design by committee.

It should have been iterated on a bit before it was released to the world, but Intel was stressed by there being several 64-bit RISC-processors on the market already.

31. panick21_ ◴[] No.45379589{3}[source]
Gordon Moore tried to link up with Intel when he was at DEC. Alpha would have become Intels 64 bit architecture. This of course didn't happen and Intel instead linked up with DEC biggest competitor HP, and adopted their, much, much worse VLIW architecture.

Imagine a future where Intel and Apple both adopt DEC and Alpha instead of Intel HP and Apple IBM.

32. duskwuff ◴[] No.45379932{4}[source]
Yes, absolutely. Itanium was designed with the expectation that memory speed/latency would keep pace with CPUs - it didn't.
33. tptacek ◴[] No.45380212{3}[source]
Example example example example must see examples!
replies(1): >>45382061 #
34. nextos ◴[] No.45380556{5}[source]
Yes, SGI sold quite a lot of high-end IA-64 machines for HPCs, e.g. https://en.wikipedia.org/wiki/SGI_Altix
35. j_not_j ◴[] No.45381495{3}[source]
Alpha had a lot of implementation problems, e.g. floating point exceptions with untraceable execution paths.

Cray tried to build the T3E (iirc) out of Alphas. DEC bragged how good Alpha was for parallel computing, big memory etc etc.

But Cray publicly denounced Alpha as unusable for parallel processing (the T3E was a bunch of Alphas in some kind of NUMA shared memory.) It was so difficult to make the chips work together.

This was in the Cray Connect or some such glossy publication. Wish I'd kept a copy.

Plus of course the usual DEC marketing incompetence. They feared Alpha undoing their large expensive machine momentum. Small workstation boxes significantly faster than big iron.

replies(2): >>45384852 #>>45385774 #
36. jcranmer ◴[] No.45382061{4}[source]
Some of the examples:

* Itanium has register windows.

* Itanium has register rotations, so that you can modulo-schedule a loop.

* Itanium has so many registers that a context switch is going to involve spilling several KB of memory.

* The main registers have "Not-a-Thing" values to be able to handle things like speculative loads that would have trapped. Handling this for register spills (or context switches!) appears to be "fun."

* It's a bi-endian architecture.

* The way you pack instructions in the EPIC encoding is... fun.

* The rules of how you can execute instructions mean that you kind of have branch delay slots, but not really.

* There are four floating-point environments because why not.

* Also, Itanium is predicated.

* The hints, oh god the hints. It feels like every time someone came up with an idea for a hint that might be useful to the processor, it was thrown in there. How is a compiler supposed to be able to generate all of these hints?

* It's an architecture that's complicated enough that you need to handwrite assembly to get good performance, but the assembly has enough arcane rules that handwriting assembly is unnecessarily difficult.

replies(2): >>45388980 #>>45421678 #
37. jabl ◴[] No.45384736{3}[source]
Well, in the sense of Alpha being the first 64-bit Linux port, and thus having to fix a lot of places where "bitness" assumptions had crept into the codebase.

DEC (Compaq?) had some plans to make cheaper Alpha workstations, and while they managed to drive down the price somewhat, the volumes were never there to make them price-competitive with PC's. (See also the Talos Raptor POWER machines..)

replies(1): >>45385736 #
38. jabl ◴[] No.45384790{6}[source]
Are you sure about that? If my memory serves, a lot of the Intel compiler people were transferred from HP? At least in the Fortran world, the Fortran frontend for the Intel compiler traces it's lineage back to DEC Fortran (for VAX and later Alpha) -> Compaq Visual Fortran (for Windows) -> Intel Fortran.
39. jabl ◴[] No.45384852{4}[source]
The Cray T3D and T3E used Alpha processors. But it wasn't really shared memory, each node with 1/(2?) CPU's ran it's own lightweight OS kernel. There were some libraries built on top of it (SHMEM) that sort-of made it look a bit like shared memory, but not really. Mostly it was a machine for running MPI applications.

A decade or so later on, they more or less recreated the architecture but this time with 64-bit Opteron CPU's in the form of the 'Red Storm' supercomputer for Sandia. Which then became commercially available as the XT3. And later XT4/5/6.

40. p_l ◴[] No.45385736{4}[source]
EV6 CPUs could ostensibly use the same chipsets etc. as Athlon (in fact, some Alpha motherboards used Athlon chipsets). That was part of the strategy to increase volume.

Then came Compaq and its love for intel.

41. p_l ◴[] No.45385755{3}[source]
With ex-Digital customers running OpenVMS, they held onto last-generation Alpha machines because they were substantially faster than the new Itanium ones in all practical uses. This is also why HP was finally forced to resurrect the nearly-complete EV7 chips
42. p_l ◴[] No.45385774{4}[source]
Part of the issue was also that it was Cray's first proper MPP system, after being very much against MPP designs in the past.
43. hylaride ◴[] No.45386459{3}[source]
Linux was natively written for 32-bit CPUs, so they had no legacy cruft or software to support. IIRC, the first 64 bit port of linux (I think to Alpha?) exposed a lot of code that needed to be rewritten as it assumed 32-bit and/or x86 specifics.
44. _flux ◴[] No.45388704{3}[source]
Could it have been a good target for e.g. Java JIT? It would be able to instrument the code at times, and then generate more optimal code for it?
replies(1): >>45389271 #
45. tptacek ◴[] No.45388980{5}[source]
I am not disappointed. Having-but-not-really-having delay slots is my favorite thing here. Thank you, by the way!
46. philipkglass ◴[] No.45389271{4}[source]
I think you may be right. It's hard for me to find then-contemporary benchmarks from 20 years ago, but this snarky Register article mentions it indirectly:

https://www.theregister.com/2004/01/27/have_a_reality_check_...

SPECjbb2000 (an important enterprise server benchmark): Itanic holds a slim (under 3%) lead over AMD64 at the 4-processor node size and another slim (under 4%) lead over POWER4+ at the 32-processor node size - hardly 'destroying' the competition, once again.

It was slightly faster than contemporary high-performance processors on Java. It was also really good at floating point performance. It was also significantly more expensive than AMD64 for server applications if you could scale your servers horizontally instead of vertically.

47. lambda_foo ◴[] No.45421678{5}[source]
The fact that Itanium had register windows was such a strange choice. I thought they had been shown to not be worthwhile like branch delay slots in MIPS. Basically hangovers from the early constraints of hardware implementations.