←back to thread

Why is Apple Rosetta 2 fast? (2022)

(dougallj.wordpress.com)
172 points fanf2 | 1 comments | | HN request time: 0s | source
Show context
Syonyk ◴[] No.42188705[source]
Post got the big one: Total Store Ordering (TSO).

The rest are all techniques in reasonably common use, but unless you have hardware support for x86's strong memory ordering, you cannot get very good x86-on-ARM performance, because it's by no means clear when strong memory ordering matters, and when it doesn't, inspecting existing code - so you have to liberally sprinkle memory barriers around, which really kill performance.

The huge and fast L1I/L1D cache doesn't hurt things either... emulation tends cache-intensive.

replies(6): >>42188819 #>>42189266 #>>42189505 #>>42189556 #>>42189596 #>>42197760 #
jsheard ◴[] No.42188819[source]
It's surprising that (AFAIK) Qualcomm didn't implement TSO in the chips they made for the recent-ish Windows ARM machines. If anything they need fast x86 emulation even more than Apple does since Windows has a much longer tail of software support than macOS, there's going to be important Windows apps that stubbornly refuse to support native ARM basically forever.
replies(8): >>42188869 #>>42188881 #>>42188889 #>>42188901 #>>42189055 #>>42189531 #>>42189551 #>>42193997 #
scottlamb ◴[] No.42188869[source]
Does Windows's translation take advantage of those where they exist? E.g. if I launch an aarch64 Windows VM on my M2, does it use the M2's support for TSO when running x86_64 .exes or does it insert these memory barriers?

If not, it makes sense that Qualcomm didn't bother adding them.

replies(3): >>42188900 #>>42188924 #>>42189541 #
1. zeusk ◴[] No.42188924{3}[source]
The OS can use what hardware supports, Mac OS does because SEG is a tightly integrated group at Apple whereas Microsoft treats hardware vendors at arm's length (pun unintended). There are roadmap sharing, planning events through leadership but it is not as cohesive as it is at Apple.