←back to thread

128 points ksec | 1 comments | | HN request time: 0.421s | source
Show context
dragontamer ◴[] No.42751521[source]
Triple decoder is one unique effect. The fact that Intel managed to get them lined up for small loops to do 9x effective instruction issue is basically miraculous IMO. Very well done.

Another unique effect is L2 shared between 4 cores. This means that thread communications across those 4 cores has much lower latencies.

I've had lots of debates with people online about this design vs Hyperthreading. It seems like the overall discovery from Intel is that highly threaded tasks use less resources (cache, ROPs, etc. etc).

Big cores (P cores or AMD Zen5) obviously can split into 2 hyperthread, but what if that division is still too big? E cores are 4 threads of support in roughly the same space as 1 Pcore.

This is because L2 cache is shared/consolidated, and other resources (ROP buffers, register files, etc. etc.) are just all so much smaller on the Ecore.

It's an interesting design. I'd still think that growing the cores to 4way SMT (like Xeon Phi) or 8way SMT (POWER10) would be a more conventional way to split up resources though. But obviously I don't work at Intel or can make these kinds of decisions.

replies(8): >>42751667 #>>42751930 #>>42752001 #>>42752140 #>>42752196 #>>42752200 #>>42753025 #>>42753142 #
adrian_b ◴[] No.42752196[source]
While the frontend of Intel Skymont, which includes instruction fetching and decoding, is very original and unlike to that of any other CPU core, the backend of Skymont, which includes the execution units, is extremely similar to that of Arm Cortex-X4 (which is a.k.a. Neoverse V3 in its server variant and as Neoverse V3AE in its automotive variant).

This similarity consists in the fact that both Intel Skymont and Arm Cortex-X4 have the same number of execution units of each kind (and there are many kinds of execution units).

Therefore it can be expected that for any application whose performance is limited by the CPU core backend, the CPU cores Intel Skymont and Arm Cortex-X4 (or Neoverse V3) should have very similar performances.

Moreover, Intel Skymont and Arm Cortex-X4 have the same die area, i.e. around 1.7 square mm (including with both cores 1 MB of L2 cache in this area). Therefore the 2 cores not only should have about the same performance for backend-limited applications, but they also have the same cost.

Before Skymont, all the older Intel Atom cores had been designed to compete with the medium-size Arm Cortex-A7xx cores, even if the Intel Atom cores have always lagged in performance Cortex-A7xx by a year or two. For instance Intel Tremont had a very similar performance to Arm Cortex-A76, while Intel Gracemont and Crestmont have an extremely similar core backend with the series of Cortex-A78 to Cortex-A725 (like Gracemont and Crestmont, the 5 cores in the series Cortex-A78, Cortex-A710, Cortex-A715, Cortex-A720 and Cortex-A725 have only insignificant differences in the execution units).

With Skymont, Intel has made a jump in E-core size, positioning it as a match for Cortex-X, not for Cortex-A7xx, like its predecessors.

replies(1): >>42755127 #
ksec ◴[] No.42755127[source]
>positioning it as a match for Cortex-X

Well the recent Cortex X5 or 925 is already at around 3.4mm2 so that comparison isn't exactly accurate. But I would love to test and see results on Skymont compared to X4. But I dont think they are available yet ( as an individual core ).

I am really looking forward to Clearwater Forest which is Skymont on 18A for Server.

And I know I am going to sound crazy but I wouldn't mind a small SoC based on Skymont and Xe2 Graphics for Smartphone to Tablets.

replies(2): >>42756022 #>>42763562 #
phonon ◴[] No.42763562[source]
> Clearwater Forest which is Skymont on 18A for Server.

Clearwater Forest will be using a further generation improved E-core, Darkmont, which will also sit on top of large local caches using Foveros Direct 3D (like AMD's X3D design). [1]

Likely Darkmont will be a server tweaked version of Skymont, but there is no public info yet available.

This is possibly the critical product which will determine if Intel will be viable from a manufacturing and design perspective...If it gets released in the next 6-9 months with good thermals, IPC, and clock speeds, Intel will have a major winner on its hands. If not....

[1] https://www.intel.com/content/dam/www/central-libraries/us/e...

replies(1): >>42770328 #
1. ksec ◴[] No.42770328[source]
My guess is that Darkmont is simply Skymont with a different name due to being 18A with different cache design.

But I hope Intel prove me wrong and bring in something even more exciting.