←back to thread

Why is Apple Rosetta 2 fast? (2022)

(dougallj.wordpress.com)
172 points fanf2 | 5 comments | | HN request time: 0.414s | source
Show context
Syonyk ◴[] No.42188705[source]
Post got the big one: Total Store Ordering (TSO).

The rest are all techniques in reasonably common use, but unless you have hardware support for x86's strong memory ordering, you cannot get very good x86-on-ARM performance, because it's by no means clear when strong memory ordering matters, and when it doesn't, inspecting existing code - so you have to liberally sprinkle memory barriers around, which really kill performance.

The huge and fast L1I/L1D cache doesn't hurt things either... emulation tends cache-intensive.

replies(6): >>42188819 #>>42189266 #>>42189505 #>>42189556 #>>42189596 #>>42197760 #
jsheard ◴[] No.42188819[source]
It's surprising that (AFAIK) Qualcomm didn't implement TSO in the chips they made for the recent-ish Windows ARM machines. If anything they need fast x86 emulation even more than Apple does since Windows has a much longer tail of software support than macOS, there's going to be important Windows apps that stubbornly refuse to support native ARM basically forever.
replies(8): >>42188869 #>>42188881 #>>42188889 #>>42188901 #>>42189055 #>>42189531 #>>42189551 #>>42193997 #
1. Syonyk ◴[] No.42188889[source]
My guess is that the sort of "legacy x86-forever" apps for Windows don't really need much in the way of performance. Think your classic Visual Basic 6 sort of thing that a business relies on for decades.

I'm also fairly certain that the TSO changes to the memory system are non-trivial, and it's possible that Qualcomm doesn't see it as a value-add in their chips - and they're probably right. Windows machines are such a hot mess that outside a relatively small group of users (who probably run Linux anyway, so aren't anyone's target market), nobody would know or care what TSO is. If it add costs and power and doesn't matter, why bother?

replies(3): >>42188907 #>>42189385 #>>42193059 #
2. jsheard ◴[] No.42188907[source]
> My guess is that the sort of "legacy x86-forever" apps for Windows don't really need much in the way of performance.

Games are a pretty notable exception that demand high performance and for the most part will be stuck on x86 forever. Brand new games might start shipping native ARM Windows binaries if the platform gets enough momentum, but games have very limited support lifecycles so it's unlikely that many released before that point will ever be updated to ARM native.

replies(1): >>42189590 #
3. tiagod ◴[] No.42189385[source]
> My guess is that the sort of "legacy x86-forever" apps for Windows don't really need much in the way of performance. Think your classic Visual Basic 6 sort of thing that a business relies on for decades.

In my experience, there's a lot of that kind of software around that was initially designed for a much simpler use-case, and has decades of badly coded features bolted in, with questionable algorithmic choices. It can be unreasonably slow in modern hardware.

Old government database sites are the worst examples in my experience. Clearly tested with a few hundred records, but 15 years later there's a few million and nobody bothered to create a bunch of indexes so searches take a couple minutes. I guess this way they can just charge to upgrade the hardware once in a while instead.

4. doctorpangloss ◴[] No.42189590[source]
> Brand new games might start shipping native ARM Windows binaries if the platform gets enough momentum, but games have very limited support lifecycles so it's unlikely that many released before that point will ever be updated to ARM native.

Unity supports Windows ARM. Unreal: probably never. IMO, the PC gaming market is so fragmented, short of Microsoft developing games for the platform, like pre-sales scale multi-millions that EGS did, games on ARM will only happen by complete accident, not because it makes sense.

5. adrian_b ◴[] No.42193059[source]
TSO only matters for programs that are internally multithreaded or which run multiple processes that have shared memory segments.

Most legacy programs like Visual Basic 6 are not of this kind.

For any other kinds of applications, the operating system handles the concurrency and it does this in the correct way for the native platform.

Nevertheless, the few programs for which TSO matters are also those where performance must have mattered if the developers bothered to implement concurrent code. Therefore low performance of the emulated application would be noticeable.