←back to thread

1080 points antipaul | 9 comments | | HN request time: 0.001s | source | bottom
Show context
maz1b ◴[] No.25065664[source]
This is pretty crazy to see, even if the full story isn't clear yet. A base level MacBook Air is taking the crown of the best MacBook Pro. Wow. SVP Johny Srouji and all of the Apple hardware + silicon team have been smashing it for the past many years.

For what it's worth, I have a fully specced out 16 inch MacBook Pro with the AMD Radeon Pro 5600m and even with that I'm regularly hitting 100% usage of the card, and not to mention the fan noise.

Looking forward to a version from Apple that is made for actual professionals, but I imagine these introductory M1 based devices are going to be great for the vast majority of people.

replies(6): >>25065838 #>>25066040 #>>25066161 #>>25066381 #>>25067539 #>>25074822 #
bigdict ◴[] No.25066381[source]
I wonder if M1 dominates an i9-9980HK at multithreaded workloads that make full use of available SIMD? Does an M1 dominate at peak theoretical flops?
replies(1): >>25067645 #
1. rbanffy ◴[] No.25067645[source]
M1 is not magic and can't break the laws of physics. SMT makes better use of silicon and will probably push speeds closer. OTOH, M1 has a fast memory that the i9 can't match.

I still bet on the i9, but it'd be interesting to run a test.

replies(2): >>25068097 #>>25068287 #
2. pg314 ◴[] No.25068097[source]
What are the laws of physics that would be broken in this case?
replies(1): >>25071619 #
3. GeekyBear ◴[] No.25068287[source]
>M1 is not magic and can't break the laws of physics.

Anandtech's deep dive provides several examples of advances in Apple's core design that didn't involve magic or breaking the laws of physics. For example...

Instruction Decode:

>What really defines Apple’s Firestorm CPU core from other designs in the industry is just the sheer width of the microarchitecture. Featuring an 8-wide decode block, Apple’s Firestorm is by far the current widest commercialized design in the industry. Other contemporary designs such as AMD’s Zen(1 through 3) and Intel’s µarch’s, x86 CPUs today still only feature a 4-wide decoder designs

Instruction Re-order Buffer Size:

>A +-630 deep ROB is an immensely huge out-of-order window for Apple’s new core, as it vastly outclasses any other design in the industry. Intel’s Sunny Cove and Willow Cove cores are the second-most “deep” OOO designs out there with a 352 ROB structure, while AMD’s newest Zen3 core makes due with 256 entries, and recent Arm designs such as the Cortex-X1 feature a 224 structure.

Number of Execution Units:

>On the Integer side, we find at least 7 execution ports for actual arithmetic operations. These include 4 simple ALUs capable of ADD instructions, 2 complex units which feature also MUL (multiply) capabilities, and what appears to be a dedicated integer division unit.

On the floating point and vector execution side of things, the new Firestorm cores are actually more impressive as they a 33% increase in capabilities, enabled by Apple’s addition of a fourth execution pipeline.

https://www.anandtech.com/show/16226/apple-silicon-m1-a14-de...

replies(2): >>25069795 #>>25071733 #
4. cesarb ◴[] No.25069795[source]
> Featuring an 8-wide decode block, Apple’s Firestorm is by far the current widest commercialized design in the industry. Other contemporary designs such as AMD’s Zen(1 through 3) and Intel’s µarch’s, x86 CPUs today still only feature a 4-wide decoder designs

This is one place where the 64-bit ARM ISA design shines: since all instructions are exactly 4 bytes wide and always aligned to 4 bytes, it's easy to make a very wide decoder, since there's no need to compute the instruction length and align the instruction stream before decoding.

5. rbanffy ◴[] No.25071619[source]
x86 needs to use more complicated logic to deal with the instruction stream than ARM, freeing more of the silicon for things like better reordering and more execution units. OTOH, the SMT somewhat mitigates the delays caused in reordering by working on more than one instruction stream at once. I'd say the 16-thread chip will end up being overall faster than the 8-core one, if cache misses don't create a huge penalty for the slower memory bus of the x86. The i9-9980HK is also two generations behind, which doesn't help it much.

When I said there is no magic, I was warning that we shouldn't expect huge speedups or a crushing advantage, at least not for long. The edge M1 has is due to a simpler ISA (which is less demanding to run efficiently, freeing more resources for optimization and execution) and a faster memory interface (which makes an L3 miss less of a punishment). This fast memory interface also limits it to, for now, 16GB of memory. If the dataset has 17GB, it'll suffer. Another difference is that all of the i9 cores are designed to be fast, whereas only 4 cores of the M1 are. This added flexibility can be put to good use by moving CPU-bound processes to the big cores and IO-bound and low-priority ones to the little ones.

In the end, they are very different chips (in design and TDP). It'd be interesting to compare them with actual measurements, as well as newer Intel ones.

6. rbanffy ◴[] No.25071733[source]
> advances in Apple's core design that didn't involve magic or breaking the laws of physics.

That's exactly what I said. It's faster, but not an order of magnitude faster and different workloads will perform differently depending on a multitude of factors (even if benchmarks don't). Do not expect it to outperform a not-too-old top-of-the-line mobile CPU by a large margin.

replies(1): >>25072305 #
7. GeekyBear ◴[] No.25072305{3}[source]
The current gen iPhone chip using the same cores literally outperforms anything Intel makes on a per core basis.

Zen 3 slightly outperforms the iPhone chip, but it runs it's clocks slower to stay inside a 5 watt power draw.

https://www.anandtech.com/show/16226/apple-silicon-m1-a14-de...

So, yes. Expect it to outperform Tiger Lake and Zen 3, at least on a per core basis.

replies(1): >>25073338 #
8. rbanffy ◴[] No.25073338{4}[source]
Remember the intel part has 8 fast cores while M1 has 4 (and 4 puny ones which really doesn't count). The Intel part also uses SMT to squeeze some extra parallelism that the reordering plumbing can't.
replies(1): >>25074153 #
9. GeekyBear ◴[] No.25074153{5}[source]
Yes, Intel makes parts with more cores, but their entry level chips only have two cores.

Apple chips with more cores will come in time as well.

It's the per core performance, especially at a given power draw, that matters going forward.