Futurelock: A subtle risk in async Rust

1. dvt ◴[31 Oct 25 20:50 UTC] No.45776550[source]▶

> &mut future1 is dropped, but this is just a reference and so has no effect. Importantly, the future itself (future1) is not dropped.

There's a lot of talk about Rust's await implementation, but I don't really think that's the issue here. After all, Rust doesn't guarantee convergence. Tokio, on the other hand (being a library that handles multi-threading), should (at least when using its own constructs, e.g. the `select!` macro).

So, since the crux of the problem is the `tokio::select!` macro, it seems like a pretty clear tokio bug. Side note, I never looked at it before, but the macro[1] is absolutely hideous.

[1] https://docs.rs/tokio/1.34.0/src/tokio/macros/select.rs.html

replies(4): >>45776605 #>>45776791 #>>45776868 #>>45777449 #

2. raggi ◴[31 Oct 25 20:57 UTC] No.45776605[source]▶

>>45776550 (TP) #

i forget if this part unwinds to the exact same place, but some of this kind of design constraint in tokio stems from the much earlier language capabilities and is prohibitive to adjust without breaking the user ecosystem.

one of the key advertised selling points in some of the other runtimes was specifically around behavior of tasks on drop of their join handles for example, for reasons closely related to this post.

3. oconnor663 ◴[31 Oct 25 21:16 UTC] No.45776791[source]▶

>>45776550 (TP) #

There's nothing `select!` could do here to force `future1` to drop, because it doesn't receive ownership of `future1`. If we wanted to force this, we'd have to forbid `select!` from polling futures by reference, but that's a pretty fundamental capability that we often rely on to `select!` in a loop for example. The blanket `impl<F> Future for &mut F where F: Future ...` isn't a Tokio thing either; that's in the standard library.

replies(1): >>45781430 #

4. mycoliza ◴[31 Oct 25 21:24 UTC] No.45776868[source]▶

>>45776550 (TP) #

What could `tokio::select!` do differently here to prevent bugs like this?

In the case of `select!`, it is a direct consequence of the ability to poll a `&mut` reference to a future in a `select!` arm, where the future is not dropped should another future win the "race" of the select. This is not really a choice Tokio made when designing `select!`, but is instead due to the existence of implementations of `Future` for `&mut T: Future + Unpin`[1] and `Pin<T: Future>`[2] in the standard library.

Tokio's `select!` macro cannot easily stop the user from doing this, and, furthermore, the fact that you can do this is useful --- there are many legitimate reasons you might want to continue polling a future if another branch of the select completes first. It's desirable to be able to express the idea that we want to continually poll drive one asynchronous operation to completion while periodically checking if some other thing has happened and taking action based on that, and then continue driving forward the ongoing operation. That was precisely what the code in which we found the bug was doing, and it is a pretty reasonable thing to want to do; a version of the `select!` macro which disallows that would limit its usefulness. The issue arises specifically from the fact that the `&mut future` has been polled to a state in which it has acquired, but not released, a shared lock or lock-like resource, and then another arm of the `select!` completes first and the body of that branch runs async code that also awaits that shared resource.

If you can think of an API change which Tokio could make that would solve this problem, I'd love to hear it. But, having spent some time trying to think of one myself, I'm not sure how it would be done without limiting the ability to express code that one might reasonably want to be able to write, and without making fundamental changes to the design of Rust async as a whole.

[1] https://doc.rust-lang.org/stable/std/future/trait.Future.htm... [2]: https://doc.rust-lang.org/stable/std/future/trait.Future.htm...

replies(3): >>45777894 #>>45778241 #>>45778288 #

5. dap ◴[31 Oct 25 22:35 UTC] No.45777449[source]▶

>>45776550 (TP) #

(author here)

Although the design of the `tokio::select!` macro creates ways to run into this behavior, I don't believe the problem is specific to `tokio`. Why wouldn't the example from the post using Streams happen with any other executor?

replies(1): >>45780559 #

6. amluto ◴[31 Oct 25 23:37 UTC] No.45777894[source]▶

>>45776868 #

I can imagine an alternate universe in which you cannot do:

1. Create future A.

2. Poll future A at least once but not provably poll it to completion and also not drop it. This includes selecting it.

3. Pause yourself by awaiting anything that does not involve continuing to poll A.

I’m struggling a bit to imagine the scenario in which it makes sense to pause a coroutine that you depend on in the middle like this. But I also don’t immediately see a way to change a language like Rust to reliably prevent doing this without massively breaking changes. See my other comment :)

7. grogers ◴[01 Nov 25 00:38 UTC] No.45778241[source]▶

>>45776868 #

I'm not familiar with tokio, but I am familiar with folly coro in C++ which is similiar-ish. You cannot co_await a folly::coro::Task by reference, you must move it. It seems like that prevents this bug. So maybe select! is the low level API and a higher level (i.e. safer) abstraction can be built on top?

8. rtpg ◴[01 Nov 25 00:47 UTC] No.45778288[source]▶

>>45776868 #

A meta-idea I have: look at all usages of `select!` with `&mut future`s in the code, and see if there are maybe 4 or 5 patterns that emerge. With that it might be possible to say "instead of `select!` use `poll_continuing!` or `poll_first_up!` or `poll_some_other_common_pattern!`".

It feels like a lot of the way Rust untangles these tricky problems is by identifying slightly more contextful abstractions, though at the cost of needing more scratch space in the mind for various methods

9. dvt ◴[01 Nov 25 10:17 UTC] No.45780559[source]▶

>>45777449 #

First of all, great write-up! Had a blast reading it :) I think there's a difference between a language giving you a footgun and a library giving you a footgun. Libraries, by definition, are supposed to be as user-friendly as possible.

For example, I can just do `loop { }` which the language is perfectly okay with letting me do anywhere in my code (and essentially hanging execution). But if I'm using a library and I'm calling `innocuous()` and there's a `loop { }` buried somewhere in there, that is (in my opinion) the library's responsibility.

N.B. I don't know enough about tokio's internals to suggest any changes and don't want to pretend like I'm an expert, but I do think this caveat should be clearly documented and a "safe" version of `select!` (which wouldn't work with references) should be provided.

10. kibwen ◴[01 Nov 25 13:19 UTC] No.45781430[source]▶

>>45776791 #

Surely not every use of `select!` needs this ability. If you can design a more restrictive interface that makes correctness easier to determine, then you should use that interface where you can, and reserve `select!` for only those cases where you can't.