Athlon 64: How AMD turned the tables on Intel

(dfarq.homeip.net)

331 points giuliomagnifico | 2 comments | 25 Sep 25 18:09 UTC | HN request time: 0s | source

Show context

bigstrat2003 ◴[25 Sep 25 19:20 UTC] No.45377613[source]▶

I remember at the time thinking it was really silly for Intel to release a 64-bit processor that broke compatibility, and was very glad AMD kept it. Years later I learned about kernel writing, and I now get why Intel tried to break with the old - the compatibility hacks piled up on x86 are truly awful. But ultimately, customers don't care about that, they just want their stuff to run.

replies(5): >>45377925 #>>45379301 #>>45380247 #>>45385323 #>>45386390 #

wvenable ◴[25 Sep 25 21:23 UTC] No.45379301[source]▶

>>45377613 #

Intel might have been successful with the transition if they didn't decide to go with such radically different and real-world untested architecture for Itanium.

replies(2): >>45379461 #>>45380469 #

pixl97 ◴[25 Sep 25 21:37 UTC] No.45379461[source]▶

>>45379301 #

Well that and Itanium was eyewateringly expensive and standard PC was much cheaper for similar or faster speeds.

replies(1): >>45380251 #

Tsiklon ◴[25 Sep 25 22:48 UTC] No.45380251[source]▶

>>45379461 #

I think Itanium was a remarkable success in some other ways. Intel utterly destroyed the workstation market with it. HP-UX, IRIX, AIX, Solaris.

Itanium sounded the deathknell for all of them.

The only Unix to survive with any market share is MacOS, (arguably because of its lateness to the party) and it has only relatively recently went back to a more bespoke architecture

replies(5): >>45380339 #>>45380406 #>>45382516 #>>45383193 #>>45388301 #

icedchai ◴[25 Sep 25 22:55 UTC] No.45380339[source]▶

>>45380251 #

I'd argue it was Linux (on x86) and the dot-com crash that destroyed the workstation market, not Itanium. The early 2000s was awash in used workstation gear, especially Sun. I've never seen anyone with an Itanium box.

replies(3): >>45380551 #>>45381130 #>>45387724 #

phire ◴[26 Sep 25 00:31 UTC] No.45381130[source]▶

>>45380339 #

While Linux helped, I'd argue the true factor is that x86 failed to die as projected.

The common attitude in the 80s and 90s was that legacy ISAs like 68k and x86 had no future. They had zero chance to keep up with the innovation of modern RISC designs. But not only did x86 keep up, it was actually outperforming many RISC ISAs.

The true factor is out-of-order execution. Some RISC contemporary designs were out-of-order too (Especially Alpha, and PowerPC to a lesser extent), but both AMD and Intel were forced to go all-in on the concept in a desperate attempt to keep the legacy x86 ISA going.

Turns out large out-of-order designs was the correct path (mostly OoO has side effect of being able to reorder memory accesses and execute them in parallel), and AMD/Intel had a bit of a head start, a pre-existing customer base and plenty of revenue for R&D.

IMO, Itanium failed not because it was a bad design, but because it was on the wrong path. Itanium was an attempt to achieve roughly the same end goal as OoO, but with a completely in-order design, relying on static scheduling. It had massive amounts of complexity that let it re-order memory reads. In an alternative universe where OoO (aka dynamic scheduling) failed, Itanium might actually be a good design.

Anyway, by the early 2000s, there just wasn't much advantage to a RISC workstation (or RISC servers). x86 could keep up, was continuing to get faster and often cheaper. And there were massive advantages to having the same ISA across your servers, workstations and desktops.

replies(2): >>45381317 #>>45382983 #

stevefan1999 ◴[26 Sep 25 05:25 UTC] No.45382983{3}[source]▶

>>45381130 #

> The true factor is out-of-order execution.

I'm pressing X: the doubt button.

I would argue that speculative execution/branch prediction and wider pipeline, both of which that OoO largely benefitted from, would be more than OoO themselves to be the sole factor. In fact I believe the improvement in semiconductor manufacturing process node could contribute more to the IPC gain than OoO itself.

replies(1): >>45383517 #

1. phire ◴[26 Sep 25 06:50 UTC] No.45383517{4}[source]▶

>>45382983 #

To be clear, when I (and most people) say OoO, I don't mean just the act of executing instructions out-of-order. I mean the whole modern paradigm of "complex branch predictors, controlling wide front-ends, feeding schedulers with wide back-ends and hundreds or even thousands of instructions in flight".

It's a little annoying that OoO is overloaded in this way. I have seen some people suggesting we should be calling these designs "Massively-Out-of-Order" or "Great-Big-Out-of-Order" in order to be more specific, but that terminology isn't in common use.

And yes, there are some designs out there which are technically out-of-order, but don't count as MOoO/GBOoO. The early PowerPC cores come to mind.

It's not that executing instructions out-of-order benefits from complex branch prediction and wide execution units, OoO is what made it viable to start using wide execution units and complex branch prediction in the first place.

A simple in-order core simply can't extract that much parallelism, the benefits drop off quickly after two-wide super scalar. And accurate branch prediction is of limited usefulness when the pipeline is that short.

There are really only two ways to extract more parallelism. You either do complex out-of-order scheduling (aka dynamic scheduling), or you take the VLIW approach and try to solve it with static scheduling, like the Itanium. They really are just two sides of the same "I want a wide core" coin.

And we all know how badly the Itanium failed.

replies(1): >>45383734 #

2. stevefan1999 ◴[26 Sep 25 07:20 UTC] No.45383734[source]▶

>>45383517 (TP) #

> I mean the whole modern paradigm of "complex branch predictors, controlling wide front-ends, feeding schedulers with wide back-ends and hundreds or even thousands of instructions in flight".

Ah, the philosophy of having the CPU execution out of ordered, you mean.

> A simple in-order core simply can't extract that much parallelism

While yes, it is also noticable that it does not have data hazard because a pipeline simply doesn't exist at all, and thus there is no need for implicit pipeline bubble or delay slot.

> And accurate branch prediction is of limited usefulness when the pipeline is that short.

You can also use a software virtual machine to turn an out-of-order CPU into basically running in-order code and you can see how slow that goes. That's why JIT VM such as HotSpot and GraalVM for JVM platform, RyuJIT for CoreCLR, and TurboFan for V8 is so much faster, because when you compile them to native instruction, the branch predictor could finally kick in.

> like the Itanium > And we all know how badly the Itanium failed.

Itanium is not exactly VLIW. It is an EPIC [^1] fail though.

[1]: https://en.wikipedia.org/wiki/Explicitly_parallel_instructio...

↑