←back to thread

Why is Apple Rosetta 2 fast? (2022)

(dougallj.wordpress.com)
172 points fanf2 | 1 comments | | HN request time: 0s | source
Show context
Syonyk ◴[] No.42188705[source]
Post got the big one: Total Store Ordering (TSO).

The rest are all techniques in reasonably common use, but unless you have hardware support for x86's strong memory ordering, you cannot get very good x86-on-ARM performance, because it's by no means clear when strong memory ordering matters, and when it doesn't, inspecting existing code - so you have to liberally sprinkle memory barriers around, which really kill performance.

The huge and fast L1I/L1D cache doesn't hurt things either... emulation tends cache-intensive.

replies(6): >>42188819 #>>42189266 #>>42189505 #>>42189556 #>>42189596 #>>42197760 #
jsheard ◴[] No.42188819[source]
It's surprising that (AFAIK) Qualcomm didn't implement TSO in the chips they made for the recent-ish Windows ARM machines. If anything they need fast x86 emulation even more than Apple does since Windows has a much longer tail of software support than macOS, there's going to be important Windows apps that stubbornly refuse to support native ARM basically forever.
replies(8): >>42188869 #>>42188881 #>>42188889 #>>42188901 #>>42189055 #>>42189531 #>>42189551 #>>42193997 #
scottlamb ◴[] No.42188869[source]
Does Windows's translation take advantage of those where they exist? E.g. if I launch an aarch64 Windows VM on my M2, does it use the M2's support for TSO when running x86_64 .exes or does it insert these memory barriers?

If not, it makes sense that Qualcomm didn't bother adding them.

replies(3): >>42188900 #>>42188924 #>>42189541 #
Syonyk ◴[] No.42188900[source]
I would expect it to not use TSO, because the toggle for it isn't, to the best of my knowledge, a general userspace toggle. It's something the kernel has to toggle, and so a VM may or may not (probably does not) even have access to the SCRs (system control registers) to change it.
replies(2): >>42188929 #>>42189543 #
1. saagarjha ◴[] No.42189543[source]
This is exposed to guest kernels of Sequoia (and maybe earlier?).