←back to thread

Futurelock: A subtle risk in async Rust

(rfd.shared.oxide.computer)
421 points bcantrill | 6 comments | | HN request time: 0.2s | source | bottom

This RFD describes our distillation of a really gnarly issue that we hit in the Oxide control plane.[0] Not unlike our discovery of the async cancellation issue[1][2][3], this is larger than the issue itself -- and worse, the program that hits futurelock is correct from the programmer's point of view. Fortunately, the surface area here is smaller than that of async cancellation and the conditions required to hit it can be relatively easily mitigated. Still, this is a pretty deep issue -- and something that took some very seasoned Rust hands quite a while to find.

[0] https://github.com/oxidecomputer/omicron/issues/9259

[1] https://rfd.shared.oxide.computer/rfd/397

[2] https://rfd.shared.oxide.computer/rfd/400

[3] https://www.youtube.com/watch?v=zrv5Cy1R7r4

1. octoberfranklin ◴[] No.45777234[source]
For anybody who wants to cut to the chase, it's this:

> The behavior of tokio::select! is to poll all branches' futures only until one of them returns `Ready`. At that point, it drops the other branches' futures and only runs the body of the branch that’s ready.

This is, unfortunately, doing what it's supposed to do: acting as a footgun.

The design of tokio::select!() implicitly assumes it can cancel tasks cleanly by simply dropping them. We learned the hard way back in the Java days that you cannot kill threads cleanly all the time. Unsurprisingly, the same thing is true for async tasks. But I guess every generation of programmers has to re-learn this lesson. Because, you know, actually learning from history would be too easy.

Unfortunately there are a bunch of footguns in tokio (and async-std too). The state-machine transformation inside rustc is a thing of beauty, but the libraries and APIs layered on top of that should have been iterated many more times before being rolled out into widespread use.

replies(2): >>45777338 #>>45778409 #
2. littlestymaar ◴[] No.45777338[source]
I genuinely don't understand why people use select! at all given how much of a footgun it is.
replies(1): >>45777400 #
3. octoberfranklin ◴[] No.45777400[source]
Well the less-footgun-ish alternative would look something like a Stream API, but the last time I checked tokio-stream wasn't stable yet.

Then you could merge a `Stream<A>` and `Stream<B>` into a `Stream<Either<A,B>>` and pull from that. Since you're dealing with owned streams, dropping the stream forces some degree of cleanup. There are still ways to make a mess, but they take more effort.

   ....................................
Ratelimit so I have to reply to mycoliza with an edit here:

That example calls `do_thing()`, whose body does not appear anywhere in the webpage. Use better identifiers.

If you meant `do_stuff()`, you haven't replaced select!() with streams, since `do_stuff()` calls `select!()`.

The problem is `select!()`; if you keep using `select!()` but just slather on a bunch of streams that isn't going to fix anything. You have to get rid of select!() by replacing it with streams.

replies(2): >>45777445 #>>45777879 #
4. mycoliza ◴[] No.45777445{3}[source]
An analogous problem is equally possible with streams: https://rfd.shared.oxide.computer/rfd/0609#_how_you_can_hit_...
5. mycoliza ◴[] No.45777879{3}[source]
In reply to your edit, that section in the RFD includes a link to the full example in the Rust playground. You’ll note that it does not make any use of ‘select!`: https://play.rust-lang.org/?version=stable&mode=debug&editio...

Perhaps the full example should have been reproduced in the RFD for clarity…

6. kmeisthax ◴[] No.45778409[source]
No, dropping a Rust future is an inherently safe operation. Futures don't live on their own, they only ever do work inside of .poll(), so you can't "catch them with their pants down" and corrupt state by dropping them. Yield points are specifically designed to be cancel-safe.

Crucially, however, because Futures have no independent existence, they can be indefinitely paused if you don't actively and repeatedly .poll() them, which is the moral equivalent of cancelling a Java Thread. And this is represented in language state as a leaked object, which is explicitly allowed in safe Rust, although the language still takes pains to avoid accidental leakage. The only correct way to use a future is to poll it to completion or drop it.

The problem is that in this situation, tokio::select! only borrows the future and thus can't drop it. It also doesn't know that dropping the Future does nothing, because borrows of futures are still futures so all the traits still match up. It's a combination of slightly unintuitive core language design and a major infrastructure library not thinking things out.