←back to thread

128 points ksec | 3 comments | | HN request time: 0.556s | source
Show context
dragontamer ◴[] No.42751521[source]
Triple decoder is one unique effect. The fact that Intel managed to get them lined up for small loops to do 9x effective instruction issue is basically miraculous IMO. Very well done.

Another unique effect is L2 shared between 4 cores. This means that thread communications across those 4 cores has much lower latencies.

I've had lots of debates with people online about this design vs Hyperthreading. It seems like the overall discovery from Intel is that highly threaded tasks use less resources (cache, ROPs, etc. etc).

Big cores (P cores or AMD Zen5) obviously can split into 2 hyperthread, but what if that division is still too big? E cores are 4 threads of support in roughly the same space as 1 Pcore.

This is because L2 cache is shared/consolidated, and other resources (ROP buffers, register files, etc. etc.) are just all so much smaller on the Ecore.

It's an interesting design. I'd still think that growing the cores to 4way SMT (like Xeon Phi) or 8way SMT (POWER10) would be a more conventional way to split up resources though. But obviously I don't work at Intel or can make these kinds of decisions.

replies(8): >>42751667 #>>42751930 #>>42752001 #>>42752140 #>>42752196 #>>42752200 #>>42753025 #>>42753142 #
1. Zardoz84 ◴[] No.42752001[source]
> I've had lots of debates with people online about this design vs Hyperthreading. It seems like the overall discovery from Intel is that highly threaded tasks use less resources (cache, ROPs, etc. etc).

AMD did something similar before. Anyone don't remember the Bulldozer cores sharing resources between pair of cores ?

replies(2): >>42753443 #>>42760093 #
2. dragontamer ◴[] No.42753443[source]
AMD's modern SMT implementation is just better than Bulldozer's decoder sharing.

Modern Zen5 has very good SMT implementations. Back 10 years ago, people mostly talked about Bulldozer's design to make fun of it. It always was intriguing to me but it just never seemed to be practical in any workflow.

3. whatevaa ◴[] No.42760093[source]
In Bulldozer, two cores shared a single Floating Point Unit. A bet was made made that majority calculations will be done in integers. Unfortunately, actual world requires decimal places, and languages like Javascript only have floating point numbers. Bulldozer flopped hard.