Futurelock: A subtle risk in async Rust

(rfd.shared.oxide.computer)

421 points bcantrill | 3 comments | 31 Oct 25 16:49 UTC | HN request time: 0.411s | source

This RFD describes our distillation of a really gnarly issue that we hit in the Oxide control plane.[0] Not unlike our discovery of the async cancellation issue[1][2][3], this is larger than the issue itself -- and worse, the program that hits futurelock is correct from the programmer's point of view. Fortunately, the surface area here is smaller than that of async cancellation and the conditions required to hit it can be relatively easily mitigated. Still, this is a pretty deep issue -- and something that took some very seasoned Rust hands quite a while to find.

[0] https://github.com/oxidecomputer/omicron/issues/9259

[1] https://rfd.shared.oxide.computer/rfd/397

[2] https://rfd.shared.oxide.computer/rfd/400

[3] https://www.youtube.com/watch?v=zrv5Cy1R7r4

Show context

octoberfranklin ◴[31 Oct 25 22:09 UTC] No.45777234[source]▶

>>45774086 (OP) #

For anybody who wants to cut to the chase, it's this:

> The behavior of tokio::select! is to poll all branches' futures only until one of them returns `Ready`. At that point, it drops the other branches' futures and only runs the body of the branch that’s ready.

This is, unfortunately, doing what it's supposed to do: acting as a footgun.

The design of tokio::select!() implicitly assumes it can cancel tasks cleanly by simply dropping them. We learned the hard way back in the Java days that you cannot kill threads cleanly all the time. Unsurprisingly, the same thing is true for async tasks. But I guess every generation of programmers has to re-learn this lesson. Because, you know, actually learning from history would be too easy.

Unfortunately there are a bunch of footguns in tokio (and async-std too). The state-machine transformation inside rustc is a thing of beauty, but the libraries and APIs layered on top of that should have been iterated many more times before being rolled out into widespread use.

replies(2): >>45777338 #>>45778409 #

littlestymaar ◴[31 Oct 25 22:24 UTC] No.45777338[source]▶

>>45777234 #

I genuinely don't understand why people use select! at all given how much of a footgun it is.

replies(1): >>45777400 #

1. octoberfranklin ◴[31 Oct 25 22:29 UTC] No.45777400[source]▶

>>45777338 #

Well the less-footgun-ish alternative would look something like a Stream API, but the last time I checked tokio-stream wasn't stable yet.

Then you could merge a `Stream<A>` and `Stream<B>` into a `Stream<Either<A,B>>` and pull from that. Since you're dealing with owned streams, dropping the stream forces some degree of cleanup. There are still ways to make a mess, but they take more effort.

   ....................................

Ratelimit so I have to reply to mycoliza with an edit here:

That example calls `do_thing()`, whose body does not appear anywhere in the webpage. Use better identifiers.

If you meant `do_stuff()`, you haven't replaced select!() with streams, since `do_stuff()` calls `select!()`.

The problem is `select!()`; if you keep using `select!()` but just slather on a bunch of streams that isn't going to fix anything. You have to get rid of select!() by replacing it with streams.

replies(2): >>45777445 #>>45777879 #

2. mycoliza ◴[31 Oct 25 22:35 UTC] No.45777445[source]▶

>>45777400 (TP) #

An analogous problem is equally possible with streams: https://rfd.shared.oxide.computer/rfd/0609#_how_you_can_hit_...

3. mycoliza ◴[31 Oct 25 23:34 UTC] No.45777879[source]▶

>>45777400 (TP) #

In reply to your edit, that section in the RFD includes a link to the full example in the Rust playground. You’ll note that it does not make any use of ‘select!`: https://play.rust-lang.org/?version=stable&mode=debug&editio...

Perhaps the full example should have been reproduced in the RFD for clarity…

↑