Ask HN: Why hasn't x86 caught up with Apple M series?

Cheese and Chips makes some bad arguments in that article.

Their claim that ARM decoders are just as complex wasn't true then and is even less true now. ARM reduced decoder size 75% from A710 to A715 by dropping legacy 32-bit stuff. Considering that x86 is way more complex than 32-bit ARM, the difference between an x86 and ARM decoder implementation is absolutely massive.

They abuse the decoder power paper (and that paper also draws a conclusion its own data doesn't support). The data shows that for integers/ALU, some 22% of total core power is used by the decoder for integer/ALU workloads. As 89% of all instructions in the entire Ubuntu repos are just 12 integer/ALU instructions, we can infer that the power cost of the decoder is significant (I'd consider nearly a quarter of the total power budget to be significant anyway).

The x86 decoder situation has gotten worse with Golden Cove (with 6 decoders) being infamous for its power draw and AMD fearing power draw so much that they opted for a super-complex dual 4-wide decoder setup. If the decoder power didn't matter, they'd be doing 10-wide decoders like the ARM designers.

The claim that ARM uses uops too is somewhere between a red herring and false equivalency. ARM uops are certainly less complex to create (otherwise they'd have kept around the uop cache) and ARM instructions being inherently less complex means that uop encoding is also going to be more simple for a given uarch compared to x86.

They then have an argument that proves too much when they say ARM has bloat too. If bloat doesn't matter, why did ARM make an entirely new ISA that ditches backward compatibility? Why take any risk to their ecosystem if there's no reward?

They also skip over the fact that objectively bad design exists. NOBODY out there defends branch delay slots. They are universally considered an active impediment to high-performance designs with ISAs like MIPS going so far as to create duplicate instructions without branch delay slots in order to speed things up. You can't argue that ISA definitely matters here, but also argue that ISA never makes any difference at all.

The "all ISAs get bloated over time" is sheer ignorance. x86 has roots going back to the early 1970s before we'd figured out computing. All the basics of CPU design are now stable and haven't really changed in 30+ years. x86 has x87 which has 80-bits because IEEE 754 didn't exist yet. Modern ISAs aren't repeating that mistake. x86 having 8 registers isn't a mistake they are going to make. Neither is 15 different 128-bit SIMD extensions or any of the many other bloated mess-ups x86 has made over the last 50+ years. There may be mistakes, but they are almost certainly going to be on fringe things. In the meantime, the core instructions will continue to be superior to x86 forever.

They also fail to address implementation complexity. Some of the weirdness of x86 like tighter memory timing gets dragged through the entire system complicating things. If this results in just 10% higher cost and 10% longer development time, that means a RISC company could develop a chip for $5.4B over 4.5 years instead of $6B over 5 years which represents a massive savings and a much lower opportunity cost while giving a compounding head-start on their x86 competitor that can be used to either hit the market sooner or make even larger performance jumps each generation.

Finally, optimizing something like RISC-V code is inherently easier/faster than optimizing x86 code because there is less weirdness to work around. RISC-V basically just has one way to do something and it'll always be optimized while x86 often has different ways to do the same thing and each has different tradeoffs that make sense in various scenarios.

As to PPC, Apple didn't sell enough laptops to pay for Motorola to put enough money into the designs to stay competitive.

Today, Apple macbooks + phones move nearly 220M chips per year. For comparison, total laptop sales last year were around 260M. If Apple had Motorola make a chip today, Motorola would have the money to build a PPC chip that could compete with and surpass what x86 offers.