Futurelock: A subtle risk in async Rust

(rfd.shared.oxide.computer)

421 points bcantrill | 1 comments | 31 Oct 25 16:49 UTC | HN request time: 0.235s | source

This RFD describes our distillation of a really gnarly issue that we hit in the Oxide control plane.[0] Not unlike our discovery of the async cancellation issue[1][2][3], this is larger than the issue itself -- and worse, the program that hits futurelock is correct from the programmer's point of view. Fortunately, the surface area here is smaller than that of async cancellation and the conditions required to hit it can be relatively easily mitigated. Still, this is a pretty deep issue -- and something that took some very seasoned Rust hands quite a while to find.

[0] https://github.com/oxidecomputer/omicron/issues/9259

[1] https://rfd.shared.oxide.computer/rfd/397

[2] https://rfd.shared.oxide.computer/rfd/400

[3] https://www.youtube.com/watch?v=zrv5Cy1R7r4

Show context

jacquesm ◴[31 Oct 25 20:43 UTC] No.45776483[source]▶

>>45774086 (OP) #

If any rust designers are lurking about here: what made you decide to go for the async design pattern instead of the actor pattern, which - to me at least - seems so much cleaner and so much harder to get wrong?

Ever since I started using Erlang it felt like I finally found 'the right way' when before then I did a lot of work with sockets and asynchronous worker threads. But even though it usually worked as advertised it had a large number of really nasty pitfalls which the actor model seemed to - effortlessy - step aside.

So I'm seriously wondering what the motivation was. I get why JS uses async, there isn't any other way there, by the time they added async it was too late to change the fundamentals of the language to such a degree. But rust was a clean slate.

replies(5): >>45776498 #>>45776569 #>>45776637 #>>45776798 #>>45777596 #

sunshowers ◴[31 Oct 25 20:45 UTC] No.45776498[source]▶

>>45776483 #

Not a Rust designer, but a big motivation for Rust's async design was wanting it to work on embedded, meaning no malloc and no threads. This unfortunately precludes the vast majority of the design space here, from active futures as seen in JS/C#/Go to the actor model.

You can write code using the actor model with Tokio. But it's not natural to do so.

replies(5): >>45776630 #>>45776683 #>>45777051 #>>45777075 #>>45778490 #

fpoling ◴[31 Oct 25 21:47 UTC] No.45777075[source]▶

>>45776498 #

Rust async still uses a native stack which just a form of memory allocator that uses LIFO order. And controlling stack usage in the embedding world is just as important as not relying on the system allocator.

So its a pity that Rust async design tried so hard to avoid any explicit allocations rather than using an explicit allocator that embedding can use to preallocate and reuse objects.

replies(2): >>45777326 #>>45777385 #

tony69 ◴[31 Oct 25 22:28 UTC] No.45777385[source]▶

>>45777075 #

Stack allocation/deallocation does not fragment memory, that’s a yuge difference for embedded systems and the main reason to avoid the heap

replies(1): >>45779568 #

fpoling ◴[01 Nov 25 06:03 UTC] No.45779568[source]▶

>>45777385 #

Even with the stack the memory can fragment. Just consider one created 10 features on the stack and the last completed last. Then memory for the first 9 will not be released until the last completes.

This problem does not happen with a custom allocator where things to allocate are of roughly the same size and allocator uses same-sized cells to allocate.

replies(1): >>45779721 #

1. jacquesm ◴[01 Nov 25 06:53 UTC] No.45779721[source]▶

>>45779568 #

Indeed, arena allocators are quite fast and allow you to really lock down the amount of memory that is in use for a particular kind of data. My own approach in the embedded world has always been to simply pre-allocate all of my data structures. If it boots it will run. Dynamic allocation of any kind is always going to have edge cases that will cause run-time issues. Much better to know that you have a deterministic system.

↑