←back to thread

Futurelock: A subtle risk in async Rust

(rfd.shared.oxide.computer)
431 points bcantrill | 1 comments | | HN request time: 0.267s | source

This RFD describes our distillation of a really gnarly issue that we hit in the Oxide control plane.[0] Not unlike our discovery of the async cancellation issue[1][2][3], this is larger than the issue itself -- and worse, the program that hits futurelock is correct from the programmer's point of view. Fortunately, the surface area here is smaller than that of async cancellation and the conditions required to hit it can be relatively easily mitigated. Still, this is a pretty deep issue -- and something that took some very seasoned Rust hands quite a while to find.

[0] https://github.com/oxidecomputer/omicron/issues/9259

[1] https://rfd.shared.oxide.computer/rfd/397

[2] https://rfd.shared.oxide.computer/rfd/400

[3] https://www.youtube.com/watch?v=zrv5Cy1R7r4

Show context
mleonhard ◴[] No.45779198[source]
> // Start a background task that takes the lock and holds it for a few seconds.

Holding a lock while waiting for IO can destroy a system's performance. With async Rust, we can prevent this by making the MutexGuard !Send, so it cannot be held across an await. Specifically, because it is !Send, it cannot be stored in the Future [2], so it must be dropped immediately, freeing the lock. This also prevents Futurelock deadlock.

This is how I wrote safina::sync::Mutex [0]. I did try to make it Send, like Tokio's MutexGuard, but stopped when I realized that it would become very complicated or require unsafe.

> You could imagine an unfair Mutex that always woke up all waiters and let them race to grab the lock again. That would not suffer from risk of futurelock, but it would have the thundering herd problem plus all the liveness issues associated with unfair synchronization primitives.

Thundering herd is when clients overload servers. This simple Mutex has O(n^2) runtime: every task must acquire and release the mutex, which adds all waiting tasks to the scheduler queue. In practice, scheduling a task is very fast (~600ns). As long as polling the lock-mutex-future is fast and you have <500 waiting tasks, then the O(n^2) runtime is fine.

Performance is hard to predict. I wrote Safina using the simplest possible implementations and assumed they would be slow. Then I wrote some micro-benchmarks and found that some parts (like the async Mutex) actually outperform Tokio's complicated versions [1]. I spent days coding optimizations that did not improve performance (work stealing) or even reduced performance (thread affinity). Now I'm hesitant to believe assumptions and predictions about performance, even if they are based on profiling data.

[0] https://docs.rs/safina/latest/safina/sync/struct.MutexGuard....

[1] https://docs.rs/safina/latest/safina/index.html#benchmark

[2] Multi-threaded async executors require futures to be Send.

replies(3): >>45780445 #>>45781201 #>>45783200 #
ufmace ◴[] No.45783200[source]
Considering this issue did also make me think - maybe the real footgun here is the async mutex. I think a better "rule" to avoid this issue might be something like, don't just use the tokio async mutex by default just because it's there and you're in an async function; instead default to a sync mutex that errors when held across awaits and think very hard about what you're really doing before you switch to the async one.
replies(1): >>45785979 #
1. ufmace ◴[] No.45785979[source]
Actually I think I might be a little misguided here - confusing a mutex with an awaitable lock method versus blocking, and a mutex whose LockGuard is Send and can be held across other await points.

To clarify, I do still think it's probably wise to prefer using a mutex whose LockGuard is not Send. If you're in an async context though, it seems clearly preferable to use a mutex that lets you await on lock instead of possibly blocking. Looks like that's what that Safina gives you.

It does bring to mind the point though - does it really make sense to call all of these things Mutexes? Most Mutexes, including the one in std, seem relatively simplistic, with no provision for exactly what happens if multiple threads/tasks are waiting to acquire the lock. As if they're designed for the case of, it's probably rare to never for multiple threads to actually need this thing at once, but we have to guard against it just to be certain. The case of this resource is in high demand by a bunch of threads, we expect there to be a lot of time spent by a lot of threads waiting to get the lock, so it's actually important which lock requesters actually get the lock in what order, seems different enough that it maybe ought to have a different name and more flexibility and selection as to what algorithm is being used to control the lock order.