Why is Apple Rosetta 2 fast? (2022)

(dougallj.wordpress.com)

177 points fanf2 | 2 comments | 19 Nov 24 21:42 UTC | HN request time: 0.461s | source

Show context

Syonyk ◴[19 Nov 24 22:18 UTC] No.42188705[source]▶

Post got the big one: Total Store Ordering (TSO).

The rest are all techniques in reasonably common use, but unless you have hardware support for x86's strong memory ordering, you cannot get very good x86-on-ARM performance, because it's by no means clear when strong memory ordering matters, and when it doesn't, inspecting existing code - so you have to liberally sprinkle memory barriers around, which really kill performance.

The huge and fast L1I/L1D cache doesn't hurt things either... emulation tends cache-intensive.

replies(6): >>42188819 #>>42189266 #>>42189505 #>>42189556 #>>42189596 #>>42197760 #

jsheard ◴[19 Nov 24 22:35 UTC] No.42188819[source]▶

>>42188705 #

It's surprising that (AFAIK) Qualcomm didn't implement TSO in the chips they made for the recent-ish Windows ARM machines. If anything they need fast x86 emulation even more than Apple does since Windows has a much longer tail of software support than macOS, there's going to be important Windows apps that stubbornly refuse to support native ARM basically forever.

replies(8): >>42188869 #>>42188881 #>>42188889 #>>42188901 #>>42189055 #>>42189531 #>>42189551 #>>42193997 #

deaddodo ◴[19 Nov 24 22:45 UTC] No.42188881[source]▶

>>42188819 #

Microsoft's AoT+JiT techniques still pull off impressive performance (90+% in almost every case, 96-99% in the majority).

But yes, if they were actually serious about Windows on ARM, they would have implemented TSO in their "custom" Qualcomm SQ1/SQ2 chips.

replies(2): >>42189365 #>>42195591 #

1. wtallis ◴[19 Nov 24 23:54 UTC] No.42189365[source]▶

>>42188881 #

Last time I checked, the default behavior for Microsoft's translation was to pretend that the hardware is doing TSO, and hope it works out. So that should obviously be fast, but occasionally wrong.

replies(1): >>42189509 #

2. saagarjha ◴[20 Nov 24 00:17 UTC] No.42189509[source]▶

>>42189365 (TP) #

They're a decent bit smarter than that but yes their emulation is not quite correct.

↑