Most active commenters
  • Rusky(4)

←back to thread

Futurelock: A subtle risk in async Rust

(rfd.shared.oxide.computer)
421 points bcantrill | 31 comments | | HN request time: 2.041s | source | bottom

This RFD describes our distillation of a really gnarly issue that we hit in the Oxide control plane.[0] Not unlike our discovery of the async cancellation issue[1][2][3], this is larger than the issue itself -- and worse, the program that hits futurelock is correct from the programmer's point of view. Fortunately, the surface area here is smaller than that of async cancellation and the conditions required to hit it can be relatively easily mitigated. Still, this is a pretty deep issue -- and something that took some very seasoned Rust hands quite a while to find.

[0] https://github.com/oxidecomputer/omicron/issues/9259

[1] https://rfd.shared.oxide.computer/rfd/397

[2] https://rfd.shared.oxide.computer/rfd/400

[3] https://www.youtube.com/watch?v=zrv5Cy1R7r4

1. singron ◴[] No.45776779[source]
This sounds very similar to priority inversion. E.g. if you have Thread T_high running at high priority and thread T_low running at low priority, and T_low holds a lock that T_high wants to acquire, T_high won't get to run until T_low gets scheduled.

The OS can detect this and make T_low "inherit" the priority of T_high. I wonder if there is a similar idea possible with tokio? E.g. if you are awaiting a Mutex held by a future that "can't run", then poll that future instead. I would guess detecting the "can't run" case would require quite a bit of overhead, but maybe it can be done.

I think an especially difficult factor is that you don't even need to use a direct await.

    let future1 = do_async_thing("op1", lock.clone()).boxed();
    tokio::select! {
      _ = &mut future1 => {
        println!("do_stuff: arm1 future finished");
      }
      _ = sleep(Duration::from_millis(500)) => {
        // No .await, but both will futurelock on future1.
        tokio::select! {
          _ = do_async_thing("op2", lock.clone()) => {},
          _ = do_async_thing("op3", lock.clone()) => {},
        };
      }
    };
I.e. so "can't run" detector needs to determine that no other task will run the future, and the future isn't in the current set of things being polled by this task.
replies(5): >>45776870 #>>45777270 #>>45777348 #>>45777553 #>>45781986 #
2. oconnor663 ◴[] No.45776870[source]
> I wonder if there is a similar idea possible with tokio? E.g. if you are awaiting a Mutex held by a future that "can't run", then poll that future instead.

Something like this could make sense for Tokio tasks. (I don't know how complicated their task scheduler is; maybe it already does stuff like this?) But it's not possible for futures within a task, as in this post. This goes all the way back to the "futures are inert" design of async Rust: You don't necessarily need to communicate with the runtime at all to create a future or to poll it or to stop polling it. You only need to talk to the runtime at the task level, either to spawn new tasks, or to wake up your own task. Futures are pretty much just plain old structs, and Tokio doesn't know how many futures my async function creates internally, any more than it knows about my integers or strings or hash maps.

replies(2): >>45777185 #>>45777246 #
3. mycoliza ◴[] No.45777185[source]
Yeah, a coworker coming from Go asked a similar question about why Rust doesn't have something like the Go runtime's deadlock detector. Your comment is quite similar to the explanation I gave him.

Go, unlike Rust, does not really have a notion of intra-task concurrency; goroutines are the fundamental unit of concurrency and parallelism. So, the Go runtime can reason about dependencies between goroutines quite easily, since goroutines are the things which it is responsible for scheduling. The fact that channels are a language construct, rather than a library construct implemented in the language, is necessary for this too. In (async) Rust, on the other hand, tasks are the fundamental unit of parallelism, but not of concurrency; concurrency emerges from the composition of `Future`s, and a single task is a state machine which may execute any number of futures concurrently (but not in parallel), by polling them until they cannot proceed without waiting and then moving on to poll another future until it cannot proceed without waiting. But critically, this is not what the task scheduler sees; it interacts with these tasks as a single top-level `Future`, and is not able to look inside at the nested futures they are composed of.

This specific failure mode can actually only happen when multiple futures are polled concurrently but not in parallel within a single Tokio task. So, there is actually no way for the Tokio scheduler to have insight into this problem. You could imagine a deadlock detector in the Tokio runtime that operates on the task level, but it actually could never detect this problem, because when these operations execute in parallel, it actually cannot occur. In fact, one of the suggestions for how to avoid this issue is to select over spawned tasks rather than futures within the same task.

replies(2): >>45778715 #>>45780306 #
4. newpavlov ◴[] No.45777246[source]
>This goes all the way back to the "futures are inert" design of async Rust

Yeap. And this footgun is yet another addition to the long list of reasons why I consider the Rust async model with its "inert" futures managed in user space a fundamentally flawed un-Rusty design.

replies(1): >>45777690 #
5. Veserv ◴[] No.45777270[source]
I thought Rust async is a colored stackless coroutine model and thus it would be unsafe to continue execution of previously executing async functions.

To explain, generally speaking, stackless coroutine async only need coloring because they are actually “independent stack”less coroutines. What they actually do is that they share the stack for their local state. This forces async function execution to proceed in LIFO order so you do not blow away the stack of the async function executing immediately after which demands state machine transforms to be safe. This is why you need coloring unlike stackful coroutine models which can execute, yield, and complete in arbitrary order since their local state is preserved in a safe location.

replies(2): >>45777333 #>>45779712 #
6. treyd ◴[] No.45777333[source]
> thus it would be unsafe to continue execution of previously executing async functions.

There's more nuance than this. You can keep polling futures as often as you want. When an async fn gets converted into the state machine, yielding is just expressed as the poll fn returning as not ready.

So it is actually possible for "a little bit" of work to happen, although that's limited and gets tricky because the way wakers work ensure that normally futures only get polled by the runtime when there's actually work for them to do.

7. elchananHaas ◴[] No.45777348[source]
My view of this is that its closer to the basic 2 lock Deadlock.

Thread 1 acquires A. Thread 2 acquires B. Thread 1 tries to acquire B. Thread 2 tries to acquire A.

In this case, the role "A" is being played by the front of the Mutex's lock queue. Role "B" is being played by the Tokio's actively executed task.

Based on this understanding, I agree that the surprising behavior is due to Tokio's Mutex/Lock Queue implementation. If this was an OS Mutex, and a thread waiting for the Mutex can't wake for some reason, the OS can wake a different thread waiting for that Mutex. I think the difficulty in this approach has to do with how Rust's async is implemented. My guess is the algorithm for releasing a lock goes something like this:

1. Pop the head of the wait queue. 2. Poll the top level tokio::spawn'ed task of the Future that is holding the Mutex.

What you want is something like this

For each Future in the wait queue (Front to Back): Poll the Future. If Success - Break ???Something if everything fails???

The reason this doesn't work has to do with how futures compose. Futures compile to states within a state machine. What happens when a future polled within the wait queue completes? How is control flow handed back to the caller?

I guess you might be able to have some fallback that polls the futures independently then polls the top level future to try and get things unstuck. But this could cause confusing behavior where futures are being polled even though no code path within your code is await'ing them. Maybe this is better though?

8. johnisgood ◴[] No.45777553[source]
Off-topic but that code looks quite... complicated as opposed to what I would write in Erlang, Elixir, Go, or even C. Maybe it is just me.
replies(1): >>45777865 #
9. filleduchaos ◴[] No.45777690{3}[source]
I feel there's a difference between a preference and a flaw. Rust has targets that make anything except inert futures simply unworkable, and in my opinion it's entirely valid for a programming language to prioritise those targets.
replies(1): >>45777971 #
10. jerf ◴[] No.45777865[source]
Erlang/Elixir and Go "solve" this problem by basically not giving you the rope to hang yourself in this particular way in the first place. This is a perfectly valid and sensible solution... but it is not the only solution. It means you're paying for some relatively expensive full locks that the Rust async task management is trying to elide, for what can be quite significant performance gains if you're doing a lot of small tasks.

It is good that not every language gives you this much control and gives some easier options for when those are adequate, but it is also good that there is some set of decent languages that do give you this degree of control for when it is necessary, and it is good that we are not surrendering that space to just C and/or C++. Unfortunately such control comes with footguns, at least over certain spans of time. Perhaps someone will figure out a way to solve this problem in Rust in the future.

replies(1): >>45779910 #
11. Rusky ◴[] No.45777971{4}[source]
The requirement is that the futures are not separate heap allocations, not that they are inert.

It's not at all obvious that Rust's is the only possible design that would work here. I strongly suspect it is not.

In fact, early Rust did some experimentation with exactly the sort of stack layout tricks you would need to approach this differently. For example, see Graydon's post here about the original implementation of iterators, as lightweight coroutines: https://old.reddit.com/r/ProgrammingLanguages/comments/141qm...

replies(2): >>45778063 #>>45779540 #
12. filleduchaos ◴[] No.45778063{5}[source]
On the other hand, early Rust also for instance had a tracing garbage collector; it's far from obvious to me how relevant its discarded design decisions are supposed to be to the language it is today.
replies(1): >>45778413 #
13. Rusky ◴[] No.45778413{6}[source]
This one is relevant because it avoids heap allocation while running the iterator and for loop body concurrently. Which is exactly the kind of thing that `async` does.
replies(1): >>45778886 #
14. mjevans ◴[] No.45778715{3}[source]
Thank you. Every time I've tried to approach the concept of Rust's parallelism this is what rubs me the wrong way.

I haven't yet read a way to prove it's correct, or even to reasonably prove a given program's use is not going to block.

With more traditional threads my mental model is that _everything_ always has to be interrupt-able, have some form of engineer chosen timeout for a parallel operation, and address failure of operation in design.

I never see any of that in the toy examples that are presented as educational material. Maybe Rust's async also requires such careful design to be safely utilized.

replies(1): >>45779051 #
15. wahern ◴[] No.45778886{7}[source]
It avoids heap allocation in some situations. But in principle the exact same optimization could be done for stackful coroutines. Heck, right now in C I could stack-allocate an array and pass it to pthread_create as the stack for a new thread. To avoid an overlarge allocation I would need to know exactly how much stack is needed, but this is exactly the knowledge the Rust compiler already requires for async/await.

What people care about are semantics. async/await leaks implementation details. One of the reasons Rust does it the way it currently does is because the implementation avoids requiring support from, e.g., LLVM, which might require some feature work to support a deeper level of integration of async without losing what benefits the current implementation provides. Rust has a few warts like this where semantics are stilted in order to confine the implementation work to the high-level Rust compiler.

replies(2): >>45780838 #>>45782337 #
16. zenmac ◴[] No.45779051{4}[source]
Guess Rust is more built for memory safety not concurrency? Erlang maybe? Why can't we just have a language that is memory safe and built for concurrency? Like Ocaml and Erlang combine?
replies(2): >>45779560 #>>45781053 #
17. vlovich123 ◴[] No.45779540{5}[source]
If it’s not inert, how do you use async in the kernel or microcontrollers? A non-inert implementation presumes a single runtime implementation within std+compiler and not usable in environments where you need to implement your own meaning of dispatch.
replies(2): >>45781492 #>>45782254 #
18. jitl ◴[] No.45779560{5}[source]
Are you looking for Gleam? Simple but powerful typed functional language for BEAM and JavaScript. It’s a bit high level compared to Ocaml in terms of needing a thick runtime and being somewhat far from machine code.

Really beautiful language design imo. Does a great job avoiding the typelevel brainfuck problem I have with Haskell.

https://gleam.run/

19. oconnor663 ◴[] No.45779712[source]
Rust futures are "just" structs with a poll() method. The poll() method is a function like any other, so it can have local variables on the stack as usual, but anything it wants to save between calls needs to be a field of the struct instead of a stack local. The magic of async/await is that the compiler figures out which of your async function's variables need to be fields on that struct, and it generated the struct and the poll method for you.

I have a blog series that goes into the concrete details if you like: https://jacko.io/async_intro.html

replies(1): >>45785003 #
20. johnisgood ◴[] No.45779910{3}[source]
> It means you're paying for some relatively expensive full locks that the Rust async task management is trying to elide, for what can be quite significant performance gains if you're doing a lot of small tasks.

The point of Erlang/Elixir is that it is as performant as possible, and Erlang's history is a testament to this. BEAM is wonderful, and really fast, along with the languages on it being ergonomic (OTP behaviors, supervisors, etc.).

replies(1): >>45781660 #
21. gf000 ◴[] No.45780306{3}[source]
I think an important takeaway here that many often ignore is that in language design, not having low-level control over something is sometimes just as important design tradeoff as having it.

From that it also follows that it may not be too fruitful to try to tackle every domain there is with a single language only.

(With that said, I absolutely love sync Rust, and Go is definitely not a good example of an elegantly designed language, I am talking in a more general way here)

22. zozbot234 ◴[] No.45780838{8}[source]
In order to know for sure how much stack is needed (or to replace the stack with a static allocation, which used to be common on older machines and still today in deep embedded code, and even on GPU!), you must ensure that any functions you call within your thread are non-reentrant, or else that they resort to an auxiliary stack-like allocation if reentrancy is required. This is a fundamental constraint (not something limited to current LLVM) which in practice leads you right back into the "what color are your functions?" world.
23. kibwen ◴[] No.45781053{5}[source]
Rust is absolutely built for concurrency, even moreso than for memory safety--it just so happens that memory safety is a prerequisite for thread safety. You're going to have a hard time finding any other industrial-strength language that statically prevents data races. If you can use Erlang, then sure, use Erlang. But if you can't use Erlang, and you need concurrency, you're not going to find a better candidate than Rust.
24. tux3 ◴[] No.45781492{6}[source]
I think the kernel and microcontroller use-case has been overstated.

A few bare metal projects use stackless coroutines (technically resumable functions) for concurrency, but it has turned out to be a much smaller use-case than anticipated. In practice C and C++ coroutines are really not worth the pain that they are to use, and Rust async has mostly taken off with heavy-duty executors like Tokio that very much don't target tiny #[no-std] 16-bit microcontrollers.

The Kernel actually doesn't use resumable functions for background work, it uses kernel threads. In the wider embedded world threads are also vastly more common than people might think, and the really low-end uniprocessor systems are usually happy to block. Since these tiny systems are not juggling dozens of requests per second that are blocking on I/O, they don't gain that much from coroutines anyways.

We mostly see bigger Rust projects use async when they have to handle concurrent requests that block on IO (network, FS, etc), and we mostly observe that the ecosystem is converging on tokio.

Threads are not free, but most embedded projects today that process requests in parallel — including the kernel — are already using them. Eager futures are more expensive than lazy futures, and less expensive than threads. They strike an interesting middle ground.

Lazy futures are extremely cheap at runtime. But we're paying a huge complexity cost in exchange that benefits a very small user-base than hasn't really fully materialized as we hoped it would.

replies(1): >>45784839 #
25. jerf ◴[] No.45781660{4}[source]
This is a myth, from the old days when BEAM was the only thing that could juggle thousands of "processes" without losing performance, and even back then, people routinely missed that while BEAM could juggle those thousands of processes, each of them was individually not that fast. That is, BEAM's extremely high performance was only in one isolated thing, not high performance across the board.

Now BEAM is far from the only runtime juggling that many processes, but it remains a relatively slow VM. I rule-of-thumb it at 10x slower than C, making it a medium performance VM at best, and you want to watch your abstraction layers in those nicer languages like Gleam because further multiplicative slow downs can really start to bite.

The first serious Go program I wrote was a replacement for something written in Erlang, there was no significant architectural improvement in the rewrite (it was already reasonably well-architected), and from the first deployment, we went from 4 systems, sometimes struggling with the load spikes, to where just one could handle it all, even with BEAM being over a decade more mature and the Go clustering code being something I wrote over a few weeks rather than battle tested, optimized code.

BEAM is good at managing concurrency, but it is slowish in other ways. It's better than the dynamic scripting languages like Python by a good amount but it is not performance-competitive with a modern compiled language.

26. dboreham ◴[] No.45781986[source]
Which is why "async" is a pox on our house. System threads can and do address these edge issues. User level concurrency generally doesn't (perhaps with the exceptions of golang and erlang).
27. Rusky ◴[] No.45782254{6}[source]
"Not inert" does not at all imply "a single runtime within std+compiler." You've jumped way too far in the opposite direction there.

The problem is that the particular interface Rust chose for controlling dispatch is not granular enough. When you are doing your own dispatch, you only get access to separate tasks, but for individual futures you are at the mercy of combinators like `select!` or `FuturesUnordered` that only have a narrow view of the system.

A better design would continue to avoid heap allocations and allow you to do your own dispatch, but operate in terms of individual suspended leaf futures. Combinators like `join!`/`select!`/etc. would be implemented more like they are in thread-based systems, waiting for sub-tasks to complete, rather than being responsible for driving them.

28. Rusky ◴[] No.45782337{8}[source]
> in principle the exact same optimization could be done for stackful coroutines.

Yes, I totally agree, and this is sort of what I imagine a better design would look like.

> One of the reasons Rust does it the way it currently does is because the implementation avoids requiring support from, e.g., LLVM

This I would argue is simply a failure of imagination. All you need from the LLVM layer is tail calls, and then you can manage the stack layout yourself in essentially the same way Rust manages Future layout.

You don't even need arbitrary tail calls. The compiler can limit itself to the sorts of things LLVM asks for- specific calling convention, matching function signatures, etc. when transferring control between tasks, because it can store most of the state in the stack that it laid out itself.

29. kibwen ◴[] No.45784839{7}[source]
> it has turned out to be a much smaller use-case than anticipated

Well, no, at the time of the design of Rust's async MVP, everyone was pretty well aware that the vast majority of the users would be writing webservers, and that the embedded use case would be a decided minority, if it ever existed at all. That Embassy exists and its ecosystem as vibrant as it is is, if anything, an unexpected triumph.

But regardless of how many people were actually expected to use it in practice, the underlying philosophy remained thus: there exist no features of Rust-the-language that are incompatible with no_std environments (e.g. Rust goes well out of its way, and introduces a lot of complexity, to make things like closures work given such constraints), and it would be exceptional and unprecedented for Rust to violate this principle when it comes to async.

replies(1): >>45785813 #
30. Veserv ◴[] No.45785003{3}[source]
I see. The Rust implementation effectively splats out the transitive closure of all your callee stack frames upfront which would enable continuing previously executing async functions.
31. tux3 ◴[] No.45785813{8}[source]
Point taken, I might have formed the wrong impression at the time.

With my C++ background, I'm very much at home with that philosophy, but I think there is room for nuance in how strictly orthodox we are.

C++ does have optional language features that introduce some often unwelcone runtime overhead, like RTTI and unwinding.

Rust does not come configured for freestanding environments out of the box either. Like C++, you are opting out of language features like unwinding as well as the standard library when going freestanding.

I want to affirm that I'm convinced Rust is great for embedded. It's more that I mostly love async when I get to use it for background I/O with a full fledged work stealing thread-per-core marvel of engineering like tokio!

In freestanding Rust the I/O code is platform specific, suddenly I'd have to write the low-level async code myself, and it's not clear this makes the typical embedded project that much higher performance, or all that easy to maintain.

So, I don't want to say anything too radical. But I think the philosophy doesn't have to be as clear cut as no language feature ever incompatible with no-std. Offering a std only language feature is not necessarily closing a door to embedded. We sort of already make opt-out concessions to have a friendlier experience for most people.

(Apologies for the wall of text)