←back to thread

Why is Apple Rosetta 2 fast? (2022)

(dougallj.wordpress.com)
172 points fanf2 | 1 comments | | HN request time: 0.218s | source
Show context
Syonyk ◴[] No.42188705[source]
Post got the big one: Total Store Ordering (TSO).

The rest are all techniques in reasonably common use, but unless you have hardware support for x86's strong memory ordering, you cannot get very good x86-on-ARM performance, because it's by no means clear when strong memory ordering matters, and when it doesn't, inspecting existing code - so you have to liberally sprinkle memory barriers around, which really kill performance.

The huge and fast L1I/L1D cache doesn't hurt things either... emulation tends cache-intensive.

replies(6): >>42188819 #>>42189266 #>>42189505 #>>42189556 #>>42189596 #>>42197760 #
jsheard ◴[] No.42188819[source]
It's surprising that (AFAIK) Qualcomm didn't implement TSO in the chips they made for the recent-ish Windows ARM machines. If anything they need fast x86 emulation even more than Apple does since Windows has a much longer tail of software support than macOS, there's going to be important Windows apps that stubbornly refuse to support native ARM basically forever.
replies(8): >>42188869 #>>42188881 #>>42188889 #>>42188901 #>>42189055 #>>42189531 #>>42189551 #>>42193997 #
dundarious ◴[] No.42189551[source]
On a first order analysis, Qualcomm doesn't want good x64 support, because good x64 support furthers the lifetime of x64, and delays the "transition" to ARM. In the final analysis, I doubt that is an economically rational strategy, because even if there is to be a transition away from x64, you need a good legacy and migration story. And I doubt such a transition will happen in the next 10 years, and certainly not spurred by anything in Microsoft land.

So maybe it's rational after all, because they know these Windows ARM products will never succeed, so they're just saving themselves the cost/effort of good support.

replies(2): >>42190281 #>>42191069 #
wolpoli ◴[] No.42190281[source]
> On a first order analysis, Qualcomm doesn't want good x64 support, because good x64 support furthers the lifetime of x64, and delays the "transition" to ARM.

The logical thing for Qualcomm in their current market share to do is to implement TSO now, then after they get momentum, create a high-end/low-end tier, and disable TSO for the low-end tier to force vendors to target both ARM/x68.

What Qualcomm is doing now makes them look like they just don't care.

replies(1): >>42193742 #
1. Someone ◴[] No.42193742[source]
> create a high-end/low-end tier, and disable TSO for the low-end tier

Wouldn’t that make the low-end tier run faster than the high-end tier, or force them to leave some performance on the table there?

Also, would a per-process flag that controls TSO be possible? Ignoring whether it’s easy to do in the hardware, the only problem I can think of with that is that the OS would have to set that on processes when they start using shared memory, or forbid using shared memory by processes that do not have it set.