←back to thread

Why is Apple Rosetta 2 fast? (2022)

(dougallj.wordpress.com)
172 points fanf2 | 5 comments | | HN request time: 0.418s | source
Show context
Syonyk ◴[] No.42188705[source]
Post got the big one: Total Store Ordering (TSO).

The rest are all techniques in reasonably common use, but unless you have hardware support for x86's strong memory ordering, you cannot get very good x86-on-ARM performance, because it's by no means clear when strong memory ordering matters, and when it doesn't, inspecting existing code - so you have to liberally sprinkle memory barriers around, which really kill performance.

The huge and fast L1I/L1D cache doesn't hurt things either... emulation tends cache-intensive.

replies(6): >>42188819 #>>42189266 #>>42189505 #>>42189556 #>>42189596 #>>42197760 #
jsheard ◴[] No.42188819[source]
It's surprising that (AFAIK) Qualcomm didn't implement TSO in the chips they made for the recent-ish Windows ARM machines. If anything they need fast x86 emulation even more than Apple does since Windows has a much longer tail of software support than macOS, there's going to be important Windows apps that stubbornly refuse to support native ARM basically forever.
replies(8): >>42188869 #>>42188881 #>>42188889 #>>42188901 #>>42189055 #>>42189531 #>>42189551 #>>42193997 #
scottlamb ◴[] No.42188869[source]
Does Windows's translation take advantage of those where they exist? E.g. if I launch an aarch64 Windows VM on my M2, does it use the M2's support for TSO when running x86_64 .exes or does it insert these memory barriers?

If not, it makes sense that Qualcomm didn't bother adding them.

replies(3): >>42188900 #>>42188924 #>>42189541 #
Syonyk ◴[] No.42188900[source]
I would expect it to not use TSO, because the toggle for it isn't, to the best of my knowledge, a general userspace toggle. It's something the kernel has to toggle, and so a VM may or may not (probably does not) even have access to the SCRs (system control registers) to change it.
replies(2): >>42188929 #>>42189543 #
1. zeusk ◴[] No.42188929[source]
TSO toggle on Apple Silicon is a user-space accessible/writable register.

It is used when you install rosetta2 for Linux VMs

https://developer.apple.com/documentation/virtualization/run...

replies(2): >>42188964 #>>42189032 #
2. Syonyk ◴[] No.42188964[source]
Are you sure it's userspace accessible?

Based on https://github.com/saagarjha/TSOEnabler/blob/master/TSOEnabl..., it's a field in ACTLR_EL1, which is explicitly (per the ARMv8 spec, at least...) not accessible to userspace (EL0) execution.

There may be some kernel interface to allow userspace to toggle that, but that's not the same as being a userspace-accessible SCR (and I also wouldn't expect it to be passed through to a VM - you'd likely need a hypercall to toggle it, unless the hypervisor emulated that, though admittedly I'm not quite as deep weeds on ARMv8 virtualization as I would prefer at the moment.

replies(1): >>42189003 #
3. zeusk ◴[] No.42189003[source]
Hmm, you’re right - maybe my memory serves incorrectly but yeah it seems it is privileged access but the interface is open to all processes to toggle the bit.
4. shadowfacts ◴[] No.42189032[source]
It is not directly accessible from user-space. Making it so requires kernel support. Apple published a set of patches for doing this on Linux: https://developer.apple.com/documentation/virtualization/acc...

Without that kernel support, all processes in the VM (not just Rosetta-translated ones) are opted-in to TSO:

> Without selective enablement, the system opts all processes into this memory mode [TSO], which degrades performance for native ARM processes that don’t need it.

replies(1): >>42189575 #
5. mrpippy ◴[] No.42189575[source]
Before Sequoia, a Linux VM using Rosetta would have TSO enabled all the time.

With Sequoia, TSO is not enabled for Linux VMs, and that kernel patch (posted in the last few weeks) is required for Rosetta to be able to enable TSO for itself. If the kernel patch isn't present, Rosetta has a non-TSO fallback mode.