←back to thread

Why is Apple Rosetta 2 fast? (2022)

(dougallj.wordpress.com)
172 points fanf2 | 1 comments | | HN request time: 0s | source
Show context
Syonyk ◴[] No.42188705[source]
Post got the big one: Total Store Ordering (TSO).

The rest are all techniques in reasonably common use, but unless you have hardware support for x86's strong memory ordering, you cannot get very good x86-on-ARM performance, because it's by no means clear when strong memory ordering matters, and when it doesn't, inspecting existing code - so you have to liberally sprinkle memory barriers around, which really kill performance.

The huge and fast L1I/L1D cache doesn't hurt things either... emulation tends cache-intensive.

replies(6): >>42188819 #>>42189266 #>>42189505 #>>42189556 #>>42189596 #>>42197760 #
jsheard ◴[] No.42188819[source]
It's surprising that (AFAIK) Qualcomm didn't implement TSO in the chips they made for the recent-ish Windows ARM machines. If anything they need fast x86 emulation even more than Apple does since Windows has a much longer tail of software support than macOS, there's going to be important Windows apps that stubbornly refuse to support native ARM basically forever.
replies(8): >>42188869 #>>42188881 #>>42188889 #>>42188901 #>>42189055 #>>42189531 #>>42189551 #>>42193997 #
Syonyk ◴[] No.42188889[source]
My guess is that the sort of "legacy x86-forever" apps for Windows don't really need much in the way of performance. Think your classic Visual Basic 6 sort of thing that a business relies on for decades.

I'm also fairly certain that the TSO changes to the memory system are non-trivial, and it's possible that Qualcomm doesn't see it as a value-add in their chips - and they're probably right. Windows machines are such a hot mess that outside a relatively small group of users (who probably run Linux anyway, so aren't anyone's target market), nobody would know or care what TSO is. If it add costs and power and doesn't matter, why bother?

replies(3): >>42188907 #>>42189385 #>>42193059 #
1. adrian_b ◴[] No.42193059[source]
TSO only matters for programs that are internally multithreaded or which run multiple processes that have shared memory segments.

Most legacy programs like Visual Basic 6 are not of this kind.

For any other kinds of applications, the operating system handles the concurrency and it does this in the correct way for the native platform.

Nevertheless, the few programs for which TSO matters are also those where performance must have mattered if the developers bothered to implement concurrent code. Therefore low performance of the emulated application would be noticeable.