←back to thread

Why is Apple Rosetta 2 fast? (2022)

(dougallj.wordpress.com)
172 points fanf2 | 4 comments | | HN request time: 0.636s | source
Show context
Syonyk ◴[] No.42188705[source]
Post got the big one: Total Store Ordering (TSO).

The rest are all techniques in reasonably common use, but unless you have hardware support for x86's strong memory ordering, you cannot get very good x86-on-ARM performance, because it's by no means clear when strong memory ordering matters, and when it doesn't, inspecting existing code - so you have to liberally sprinkle memory barriers around, which really kill performance.

The huge and fast L1I/L1D cache doesn't hurt things either... emulation tends cache-intensive.

replies(6): >>42188819 #>>42189266 #>>42189505 #>>42189556 #>>42189596 #>>42197760 #
jsheard ◴[] No.42188819[source]
It's surprising that (AFAIK) Qualcomm didn't implement TSO in the chips they made for the recent-ish Windows ARM machines. If anything they need fast x86 emulation even more than Apple does since Windows has a much longer tail of software support than macOS, there's going to be important Windows apps that stubbornly refuse to support native ARM basically forever.
replies(8): >>42188869 #>>42188881 #>>42188889 #>>42188901 #>>42189055 #>>42189531 #>>42189551 #>>42193997 #
1. deaddodo ◴[] No.42188881[source]
Microsoft's AoT+JiT techniques still pull off impressive performance (90+% in almost every case, 96-99% in the majority).

But yes, if they were actually serious about Windows on ARM, they would have implemented TSO in their "custom" Qualcomm SQ1/SQ2 chips.

replies(2): >>42189365 #>>42195591 #
2. wtallis ◴[] No.42189365[source]
Last time I checked, the default behavior for Microsoft's translation was to pretend that the hardware is doing TSO, and hope it works out. So that should obviously be fast, but occasionally wrong.
replies(1): >>42189509 #
3. saagarjha ◴[] No.42189509[source]
They're a decent bit smarter than that but yes their emulation is not quite correct.
4. 486sx33 ◴[] No.42195591[source]
Its funny, Microsoft and ARM emulation is good because of Qualcomm, or rather in spite of Qualcomm's limitations.

If Qualcomm had done better, then the software wouldn't have to be so good, and they'd likely have maintained more market share.

Instead, Microsoft had to make their x86 on arm emu good enough to work on Qualcomm's crap, which now works really nicely on apple arm.