←back to thread

128 points ksec | 3 comments | | HN request time: 0.673s | source
Show context
dragontamer ◴[] No.42751521[source]
Triple decoder is one unique effect. The fact that Intel managed to get them lined up for small loops to do 9x effective instruction issue is basically miraculous IMO. Very well done.

Another unique effect is L2 shared between 4 cores. This means that thread communications across those 4 cores has much lower latencies.

I've had lots of debates with people online about this design vs Hyperthreading. It seems like the overall discovery from Intel is that highly threaded tasks use less resources (cache, ROPs, etc. etc).

Big cores (P cores or AMD Zen5) obviously can split into 2 hyperthread, but what if that division is still too big? E cores are 4 threads of support in roughly the same space as 1 Pcore.

This is because L2 cache is shared/consolidated, and other resources (ROP buffers, register files, etc. etc.) are just all so much smaller on the Ecore.

It's an interesting design. I'd still think that growing the cores to 4way SMT (like Xeon Phi) or 8way SMT (POWER10) would be a more conventional way to split up resources though. But obviously I don't work at Intel or can make these kinds of decisions.

replies(8): >>42751667 #>>42751930 #>>42752001 #>>42752140 #>>42752196 #>>42752200 #>>42753025 #>>42753142 #
Salgat ◴[] No.42753142[source]
What we desperately need before we get too deep into this is stronger support in languages for heterogeneous cores in an architecture agnostic way. Some way to annotate that certain threads should run on certain types of cores (and close together in memory hierarchy) without getting too deep into implementation details.
replies(2): >>42754839 #>>42757713 #
dragontamer ◴[] No.42757713[source]
OpenMP, Intel's TBB and other libraries/tools are clearly moving in this direction.

The main issue is that Intel is... well Intel. Even if they write a good library, there's probably 0% chance it'd work well on ARM systems their competitor. (And only a small chance that it'd be optimized for AMD).

------

Microsoft did put a lot of work into ConcRT, but it doesn't look very successful. Its a very clean model of task-based scheduling, but I'm not seeing too much buzz about it or too many blog posts marketing the benefits.

replies(1): >>42761206 #
1. adgjlsfhk1 ◴[] No.42761206[source]
The other problem Intel has is that they are apparently a horrible factional mess of a company. The fact that the P and E cores are completely separate architectures that sometimes don't even agree on what instruction set they are supporting (e.g. avx-512) is kind of crazy.
replies(1): >>42764651 #
2. dragontamer ◴[] No.42764651[source]
AMD had Bulldozer and Bobcat back in the day. Two teams with two different goals is fine, as long as they work together at the end.

And P-cores and E-cores do seem like they are working together well in the "Ultra" series.

replies(1): >>42767912 #
3. adgjlsfhk1 ◴[] No.42767912[source]
using AMD in the bulldozer era as a comparison to Intel is a really bad sign.