Skymont: Intel's E-Cores reach for the Sky (2024)

(chipsandcheese.com)

128 points ksec | 1 comments | 18 Jan 25 19:27 UTC | HN request time: 0.208s | source

Show context

rwmj ◴[18 Jan 25 22:54 UTC] No.42752085[source]▶

Slightly off topic, but if I'm aiming to get the fastest 'make -jN' for some random C project (such as the kernel) should I set N = #P [threads] + #E, or just the #P, or something else? Basically, is there a case where using the E cores slows a compile down? Or is power management a factor?

I timed it on the single Intel machine I have access to with E-cores and setting N = #P + #E was in fact the fastest, but I wonder if that's a general rule.

replies(3): >>42752108 #>>42752141 #>>42752456 #

saurik ◴[18 Jan 25 22:58 UTC] No.42752108[source]▶

>>42752085 #

Did you test at least +1 if not *1.5 or something? I would expect you to occasionally get blocked on disk I/O and would want some spare work sitting hot to switch in.

replies(1): >>42752119 #

rwmj ◴[18 Jan 25 23:00 UTC] No.42752119[source]▶

>>42752108 #

Let me test that now. Note I only have 1 Intel machine so any results are very specific to this laptop.

  -j           time (mean ± σ)
  12 (#P+#E)   130.889 s ±  4.072 s
  13 (..+1)    135.049 s ±  2.270 s
   4 (#P)      179.845 s ±  1.783 s
   8 (#E)      141.669 s ±  3.441 s

Machine: 13th Gen Intel(R) Core(TM) i7-1365U; 2 x P-cores (4 threads), 8 x E-cores

replies(1): >>42752557 #

wtallis ◴[19 Jan 25 00:20 UTC] No.42752557[source]▶

>>42752119 #

Your processor has two P cores, and ten cores total, not twelve. The HyperThreading (SMT) does not make the two P cores into four cores. Your experiment with 4 threads will most likely result in using both P cores and two E cores, as no sane OS would double up threads on the P cores before the E cores were full with one thread each.

replies(2): >>42752610 #>>42752692 #

rwmj ◴[19 Jan 25 00:33 UTC] No.42752610[source]▶

>>42752557 #

The hyperthreading should cover up memory latency, since the workload (compiling qemu) might not fit into L3 cache. Although I take your point that it doesn't magically create two core-equivalents.

replies(1): >>42753268 #

gonzo ◴[19 Jan 25 03:13 UTC] No.42753268[source]▶

>>42752610 #

“Hyperthreading” is a write pipe hack.

If the core stalls on a write then the other thread gets run.

replies(1): >>42756450 #

1. atq2119 ◴[19 Jan 25 12:31 UTC] No.42756450[source]▶

>>42753268 #

It's much more than that. It also allows one thread to make progress while the other is waiting for memory loads, or filling in instruction slots while the other thread is recovering from a branch mispredict.

Compilers tend to do a lot of pointer chasing and branching, so it's expected that they would benefit decently from hyperthreading.

↑