←back to thread

Futurelock: A subtle risk in async Rust

(rfd.shared.oxide.computer)
421 points bcantrill | 1 comments | | HN request time: 0.203s | source

This RFD describes our distillation of a really gnarly issue that we hit in the Oxide control plane.[0] Not unlike our discovery of the async cancellation issue[1][2][3], this is larger than the issue itself -- and worse, the program that hits futurelock is correct from the programmer's point of view. Fortunately, the surface area here is smaller than that of async cancellation and the conditions required to hit it can be relatively easily mitigated. Still, this is a pretty deep issue -- and something that took some very seasoned Rust hands quite a while to find.

[0] https://github.com/oxidecomputer/omicron/issues/9259

[1] https://rfd.shared.oxide.computer/rfd/397

[2] https://rfd.shared.oxide.computer/rfd/400

[3] https://www.youtube.com/watch?v=zrv5Cy1R7r4

1. oconnor663 ◴[] No.45776463[source]
> FAQ: doesn’t future1 get cancelled?

I guess cancellation is really two different things, which usually happen at the ~same time, but not in this case: 1) the future stops getting polled, and 2) the future gets dropped. In this example the drop is delayed, and because the future is holding a guard,* the delay has side effects. So the future "has been cancelled" in the sense that it will never again make forward progress, but it "hasn't been cancelled yet" in the sense that it's still holding resources. I wonder if it's practical to say "make sure those two things always happen together"?

* Technically a Tokio-internal `Acquire` future that owns a queue position to get a guard, but it sounds like the exact same bug could manifest after it got the guard too, so let's call it a guard.