←back to thread

MacOS Catalina: Slow by Design?

(sigpipe.macromates.com)
2031 points jrk | 9 comments | | HN request time: 1.176s | source | bottom
Show context
jaimehrubiks ◴[] No.23273553[source]
In our company many of us have similar issues. I have always loved OSX but this time it is driving me crazy. I though the issue was some sort of company antivirus/firewall, or it could even be a combination of that and this issue (maybe my vpn + path to company firewall is what magnifies the issue in this post). The thing is that some commands take 1 second, some others take 2 minutes or even more. Actually, some commands slow down the computer until they are finished (more likely, until they just decide to start).

For example, I can run "terraform apply" and it could take up to 5 minutes to start, leaving my computer almost unusable until it runs. The weird thing is that this only happens sometimes. In some cases, I restart the laptop and it starts working a little bit faster, but the issue comes back after some time.

It's already been a few months since I try to run every command from a VM in a remote location, since I am tired of waiting for my commands to start.

I have a macbook air from 2013 which never had this issue.

Any easy fix that I could test? Disconnecting from the internet is not an option. Disabling SIP could be tried, but I think I already did and didn't seem to fix it, plus it is not a good idea for a company laptop.

Don't we have some sort of hosts file or firewall that we can use to block or fake the connectivity to apple servers?

replies(5): >>23273869 #>>23273932 #>>23274213 #>>23275720 #>>23278491 #
derefr ◴[] No.23274213[source]
IIRC the big thing that changed with 10.15 for CLI applications is that BSD-userland processes (i.e. ones that don't go through all the macOS Frameworks, but just call libc syscall wrappers like fopen(2)) now also deal with sandboxing, since the BSD syscall ABI is now reimplemented in terms of macOS security capabilities.

Certain BSD-syscall-ABI operations like fopen(2) and readdir(2) are now not-so-fast by default, because the OS has to do a synchronous check of the individual process binary's capabilities before letting the syscall through. But POSIX utilities were written to assume that these operations were fast-ish, and therefore they do tons of them, rather than doing any sort of batching.

That means that any CLI process that "walks" the filesystem is going to generate huge amounts of security-subsystem request traffic; which seemingly bottlenecks the security subsystem (OS-wide!); and so slows down the caller process and any other concurrent processes/threads that need capabilities-grants of their own.

To find a fix, it's important to understand the problem in fine detail. So: the CLI process has a set of process-local capabilities (kernel tokens/handles); and whenever it tries to do something, it first tries to use these. If it turns out none of those existing capabilities let it perform the operation, then it has to request the kernel look at it, build a firewall-like "capabilities-rules program" from the collected information, and run it, to determine whether it should grant the process that capability. (This means that anything that already has capabilities granted from its code-signed capabilities manifest doesn't need to sit around waiting for this capabilities-ruleset program to be built and run. Unless the app's capabilities manifest didn't grant the specific capability it's trying to use.)

Unlike macOS app-bundles, regular (i.e. freshly-compiled) BSD-userland executable binaries don't have a capabilities manifest of their own, so they don't start with any process-local capabilities. (You can embed one into them, but the process has to be "capabilities-aware" to actually make use of it, so e.g. GNU coreutils from Homebrew isn't gonna be helped by this. Oh, and it won't kick in if the program isn't also code-signed, IIRC.)

But all processes inherit their capabilities from their runtime ancestors, so there's a simple fix, for the case of running CLI software interactively: grant your terminal emulator the capabilities you need through Preferences. In this case, the "Full Disk Access" capability. Then, since all your all CLI processes have your terminal emulator as a runtime ancestor-process, all your CLI processes will inherit that capability, and thus not need to spend time requesting it from the security subsystem.

Note that this doesn't apply to BSD-userland executable binaries which run as LaunchDaemons, since those aren't being spawned by your terminal emulator. Those either need to learn to use capabilities for real; or, at least, they need to get exec(2)ed by a shim binary that knows how.

-----

tl;dr: I had this problem (slowness in numerous CLI apps, most obvious as `brew upgrade` suddenly taking forever) after upgrading to 10.15 as well. Granting "Full Disk Access" to iTerm fixed it for me.

replies(2): >>23274332 #>>23274780 #
1. jfkebwjsbx ◴[] No.23274332[source]
Why would sandboxing be slower?

They are definitely doing something way too slow.

replies(1): >>23274459 #
2. derefr ◴[] No.23274459[source]
Apple replaced the very simple (i.e. function fits in a cache line; inputs fit in a single dword) BSD user/group/other filesystem privileges system, with a Lisp interpreter (or maybe compiler? not sure) executing some security DSL[1][2].

[1] https://wiki.mozilla.org/Sandbox/OS_X_Rule_Set

[2] https://reverse.put.as/wp-content/uploads/2011/09/Apple-Sand...

This capabilities-ruleset interpreter is what Apple uses the term "Gatekeeper" to refer to, mostly. It had already been put in charge of authorizing most Cocoa-land system interactions as of 10.12. But the capabilities-ruleset interpreter wasn't in the code-path for any BSD-land code until 10.15.

A capabilities-ruleset "program" for this interpreter can be very simple (and thus quick to execute), or arbitrarily complex. In terms of how complex a ruleset can get—i.e. what the interpreter's runtime allows it to take into consideration in a single grant evaluation—it knows about all the filesystem bitflags BSD used to, plus Gatekeeper-level grants (e.g. the things you do in Preferences; the "com.apple.quarantine" xattr), plus external system-level capabilities "hotfixes" (i.e. the same sort of "rewrite the deployed code after the fact" fixes that GPU makers deploy to make games run better, but for security instead of performance), plus some stuff (that I don't honestly know too much about) that can require it to contact Apple's servers during the ruleset execution. Much of this stuff can be cached between grant requests, but some of it will inevitably have to hit the disk (or the network!) for a lookup—in the middle of a blocking syscall.

I'm not sure whether it's the implementation (an in-kernel VM doesn't imply slowness; see eBPF) or the particular checks that need to be done, but either way, it adds up to a bit of synchronous slowness per call.

The real killer that makes you notice the problem, though, isn't the per-call overhead, but rather that the whole security subsystem seems to now have an OS-wide concurrency bottleneck in it for some reason. I'm not sure where it is, exactly; the "happy path" for capabilities-grants shouldn't make any Mach IPC calls at all. But it's bottlenecked anyway. (Maybe there's Mach IPC for audit logging?)

The security framework was pretty obviously structured to expect that applications would only send it O(1) capability-grant requests, since the idiomatic thing to do when writing a macOS Cocoa-userland application, if you want to work with a directory's contents, is to get a capability on a whole directory-tree from a folder-picker, and then use that capability to interact with the files.

Under such an approach, the sandbox system would never be asked too many questions at a time, and so you'd never really end up in a situation where the security system is going to be bottlenecked for very long. You'd mostly notice it as increased post-reboot startup latency, not as latency under regular steady-state use.

Under an approach where you've got many concurrent BSD "filesystem walker" processes, each spamming individual fopen(2)-triggered capability requests into the security system, though, a failure-to-scale becomes very apparent. Individual capabilities-grant requests go from taking 0.1s to resolve, to sometimes over 30s. (It's very much like the kind of process-inbox bottlenecks you see in Erlang, that are solved by using process pools or ETS tables.)

Either Apple should have rethought the IPC architecture of sandboxing in 10.15, but forgot/deprioritized this; or they should have made their BSD libc transparently handle "push down" of capabilities to descendent requests, but forgot/deprioritized that.

replies(3): >>23275708 #>>23281159 #>>23282252 #
3. saagarjha ◴[] No.23275708[source]
> Lisp interpreter (or maybe compiler? not sure)

I believe it is actually a Scheme dialect, and I would be very surprised if it is not compiled to some internal representation upon load.

> This capabilities-ruleset interpreter is what Apple uses the term "Gatekeeper" to refer to, mostly.

I am fairly sure Gatekeeper is mostly just Quarantine and other bits that prevent the execution of random things you download from the internet.

replies(1): >>23277757 #
4. lioeters ◴[] No.23277757{3}[source]
In the Apple Sandbox Guide v1.0 [1], it mentions Dionysus Blazakis' paper [2] presented at Blackhat DC 2011.

In the latter, Apple's sandbox rule set (custom profiles) is called SBPL - Sandbox Profile Language - and is described as a "Scheme embedded domain specific language".

It's evaluated by libSandbox, which contains TinyScheme! [3]

From what I could understand, the Scheme interpreter generates a blob suitable for passing to the kernel.

---

[1] https://reverse.put.as/wp-content/uploads/2011/09/Apple-Sand...

[2] https://media.blackhat.com/bh-dc-11/Blazakis/BlackHat_DC_201...

[3] http://tinyscheme.sourceforge.net/home.html

replies(1): >>23278131 #
5. saagarjha ◴[] No.23278131{4}[source]
That sounds about right. I was doing some work in this area very recently, which found a couple of methods to bypass sandboxing entirely, but somewhat humorously the issues did not require me to have any understanding of how the lower levels of this worked ;)
replies(1): >>23278366 #
6. lioeters ◴[] No.23278366{5}[source]
Blazakis' paper is a fascinating investigative/exploratory work, delving deep into the sandbox mechanism. I learned more than I wanted to know!
replies(1): >>23279434 #
7. saagarjha ◴[] No.23279434{6}[source]
Yeah, it's on my reading list :)
8. comex ◴[] No.23281159[source]
The Scheme interpreter only runs when compiling a sandbox. It's compiled into a simple non-Turing-complete bytecode, and that's what's consulted on every syscall. This has been the case since… 10.5 or something. It's always been on the path for BSD code. And Cocoa operations lower to BSD syscalls anyway. There's no system for them to get a "capability" for a directory tree; on the contrary, file descriptors ought to be able to serve as capabilities, but the Sandbox kext stupidly computes the full path for every file that's accessed before matching it against a bunch of regexes. This too has been the case as long as Sandbox has existed.

There is a bunch of new stuff in 10.15, mostly involving binary execs (and I don't understand all of it), but I'm pretty sure it doesn't match what you're describing.

9. jfkebwjsbx ◴[] No.23282252[source]
> Much of this stuff can be cached between grant requests, but some of it will inevitably have to hit the disk (or the network!) for a lookup—in the middle of a blocking syscall.

Running any kind of I/O during a capability check is a broken design.

There is no reason to hit the disk (it should be preloaded), much less the network (such a design will never work if offline).