←back to thread

447 points stephenheron | 1 comments | | HN request time: 0s | source

Hi,

My daily workhorse is a M1 Pro that I purchased on release date, It has been one of the best tech purchases I have made, even now it really deals with anything I throw at it. My daily work load is regularly having a Android emulator, iOS simulator and a number of Dockers containers running simultaneously and I never hear the fans, battery life has taken a bit of a hit but it is still very respectable.

I wanted a new personal laptop, and I was debating between a MacBook Air or going for a Framework 13 with Linux. I wanted to lean into learning something new so went with the Framework and I must admit I am regretting it a bit.

The M1 was released back in 2020 and I bought the Ryzen AI 340 which is one of the newest 2025 chips from AMD, so AMD has 5 years of extra development and I had expected them to get close to the M1 in terms of battery efficiency and thermals.

The Ryzen is using a TSMC N4P process compared to the older N5 process, I managed to find a TSMC press release showing the performance/efficiency gains from the newer process: “When compared to N5, N4P offers users a reported +11% performance boost or a 22% reduction in power consumption. Beyond that, N4P can offer users a 6% increase in transistor density over N5”

I am sorely disappointed, using the Framework feels like using an older Intel based Mac. If I open too many tabs in Chrome I can feel the bottom of the laptop getting hot, open a YouTube video and the fans will often spin up.

Why haven’t AMD/Intel been able to catch up? Is x86 just not able to keep up with the ARM architecture? When can we expect a x86 laptop chip to match the M1 in efficiency/thermals?!

To be fair I haven’t tried Windows on the Framework yet it might be my Linux setup being inefficient.

Cheers, Stephen

Show context
BirAdam ◴[] No.45025602[source]
First, Apple did an excellent job optimizing their software stack for their hardware. This is something that few companies have the ability to do as they target a wide array of hardware. This is even more impressive given the scale of Apple's hardware. The same kernel runs on a Watch and a Mac Studio.

Second, the x86 platform has a lot of legacy, and each operation on x86 is translated from an x86 instruction into RISC-like micro-ops. This is an inherent penalty that Apple doesn't have pay, and it is also why Rosetta 2 can achieve "near native" x86 performance; both platform translate the x86 instructions.

Third, there are some architectural differences even if the instruction decoding steps are removed from the discussion. Apple Silicon has a huge out-of-order buffer, and it's 8-wide vs x86 4-wide. From there, the actual logic is different, the design is different, and the packaging is different. AMD's Ryzen AI Max 300 series does get close to Apple by using many of the same techniques like unified memory and tossing everything onto the package, where it does lose is due to all of the other differences.

In the end, if people want crazy efficiency Apple is a great answer and delivers solid performance. If people want the absolute highest performance, then something like Ryzen Threadripper, EPYC, or even the higher-end consumer AMD chips are great choices.

replies(5): >>45025823 #>>45025997 #>>45026202 #>>45026822 #>>45035587 #
zipityzi ◴[] No.45026202[source]
This seems mostly misinformed.

1) Apple Silicon outperforms all laptop CPUs in the same power envelope on 1T on industry-standard tests: it's not predominantly due to "optimizing their software stack". SPECint, SPECfp, Geekbench, Cinebench, etc. all show major improvements.

2) x86 also heavily relies on micro-ops to greatly improve performance. This is not a "penalty" in any sense.

3) x86 is now six-wide, eight-wide, or nine-wide (with asterisks) for decode width on all major Intel & AMD cores. The myth of x86 being stuck on four-wide has been long disproven.

4) Large buffers, L1, L2, L3, caches, etc. are not exclusive to any CPU microarchitecture. Anyone can increase them—the question is, how much does your core benefit from larger cache features?

5) Ryzen AI Max 300 (Strix Halo) gets nowhere near Apple on 1T perf / W and still loses on 1T perf. Strix Halo uses slower CPUs versus the beastly 9950X below:

Fanless iPad M4 P-core SPEC2017 int, fp, geomean: 10.61, 15.58, 12.85 AMD 9950X (Zen5) SPEC2017 int, fp, geomean: 10.14, 15.18, 12.41 Intel 285K (Lion Cove) SPEC2017 int, fp, geomean: 9.81, 12.44, 11.05

Source: https://youtu.be/2jEdpCMD5E8?t=185, https://youtu.be/ymoiWv9BF7Q?t=670

The 9950X & 285K eat 20W+ per core for that 1T perf; the M4 uses ~7W. Apple has a node advantage, but no node on Earth gives you 50% less power.

There is no contest.

replies(6): >>45026508 #>>45026553 #>>45026632 #>>45027077 #>>45027619 #>>45029677 #
BirAdam ◴[] No.45026632[source]
1. Apple’s optimizations are one point in their favor. XNU is good, and Apple’s memory management is excellent.

2. X86 micro-ops vs ARM decode are not equivalent. X86’s variable length instructions make the whole process far more complicated than it is on something like ARM. This is a penalty due to legacy design.

3. The OP was talking about M1. AFAIK, M4 is now 10-wide, and most x86 is 6-wide (Ryzen 5 does some weird stuff). X86 was 4-wide at the time of M1’s introduction.

4. M1 has over 600 reorder buffer registers… it’s significantly larger than competitors.

5. Close relative to x86 competitors.

replies(1): >>45032876 #
1. 1000100_1000101 ◴[] No.45032876[source]
> 4. M1 has over 600 reorder buffer registers… it’s significantly larger than competitors.

And? Are you saying neither Intel nor AMD engineers were able to determine that this was a bottleneck worth chasing? The point was, anybody could add more cache, rename, reorder or whatever buffers they wanted to... it's not Apple secret-sauce.

If all the competition knew they were leaving all this performance/efficiency on the table despite there being a relatively simple fix, that's on them. They got overtaken by a competitor with a better offering.

If all the competition didn't realize they were leaving all this performance/efficiency on the table despite there being a relatively simple fix, that's also on them. They got overtaken by a competitor with better offering AND more effective engineers.