Most active commenters

    ←back to thread

    Futurelock: A subtle risk in async Rust

    (rfd.shared.oxide.computer)
    421 points bcantrill | 12 comments | | HN request time: 0.347s | source | bottom

    This RFD describes our distillation of a really gnarly issue that we hit in the Oxide control plane.[0] Not unlike our discovery of the async cancellation issue[1][2][3], this is larger than the issue itself -- and worse, the program that hits futurelock is correct from the programmer's point of view. Fortunately, the surface area here is smaller than that of async cancellation and the conditions required to hit it can be relatively easily mitigated. Still, this is a pretty deep issue -- and something that took some very seasoned Rust hands quite a while to find.

    [0] https://github.com/oxidecomputer/omicron/issues/9259

    [1] https://rfd.shared.oxide.computer/rfd/397

    [2] https://rfd.shared.oxide.computer/rfd/400

    [3] https://www.youtube.com/watch?v=zrv5Cy1R7r4

    1. Sytten ◴[] No.45776277[source]
    I am wondering if there is a larger RFC for Rust to force users to not hold a variable across await points.

    In my mind futurelock is similar to keeping a sync lock across an await point. We have nothing right now to force a drop and I think the solution to that problem would help here.

    replies(5): >>45776433 #>>45776480 #>>45776533 #>>45777165 #>>45786112 #
    2. cogman10 ◴[] No.45776433[source]
    The ideas that have been batted around is called "async drop" [1]

    And it looks like it's still just an unaddressed well known problem [2].

    Honestly, once the Mozilla sackening of rust devs happened it seems like the language has been practically rudderless. The RFC system seems almost dead as a lot of the main contributors are no longer working on rust.

    This initiative hasn't had motion since 2021. [3]

    [1] https://rust-lang.github.io/async-fundamentals-initiative/ro...

    [2] https://rust-lang.github.io/async-fundamentals-initiative/

    [3] https://github.com/rust-lang/async-fundamentals-initiative

    replies(2): >>45776530 #>>45781483 #
    3. sunshowers ◴[] No.45776480[source]
    Note that forcing a drop of a lock guard has its own issues, particularly around leaving the guarded data in an invalid state. I cover this a bit in my talk that Bryan linked to in the OP [1].

    [1] timestamped: https://youtu.be/zrv5Cy1R7r4?t=1067

    4. raggi ◴[] No.45776530[source]
    Those pages are out of date, and AsyncDrop is in progress: https://github.com/rust-lang/rust/issues/126482

    I think "practically rudderless" here is fairly misinformed and a little harmful/rude to all the folks doing tons of great work still.

    It's a shame there are some stale pages around and so on, but they're not good measures of the state of the project or ecosystem.

    The problem of holding objects across async points is also partially implemented in this unstable lint marker which is used by some projects: https://dev-doc.rust-lang.org/unstable-book/language-feature...

    You also get a similar effect in multi-threaded runtimes by not arbitrarily making everything in your object model Send and instead designing your architecture so that most things between wake-ups don't become arbitrarily movable references.

    These aren't perfect mitigations, but some tools.

    replies(2): >>45776684 #>>45776696 #
    5. ameliaquining ◴[] No.45776533[source]
    There's an existing lint that lets you prohibit instances of specific types from being held across await points: https://rust-lang.github.io/rust-clippy/stable/index.html#aw...
    6. bigstrat2003 ◴[] No.45776684{3}[source]
    In fairness, if you're a layman to the rust development process (as I am, so I'm speaking from personal experience here) it's damn near impossible to figure out the status of things. There tracking issues, RFCs, etc which is very confusing as an outsider and gives no obvious place to look to find out the current status of a proposal. I'm sure there is a logic to it and that if I spent the time to learn it would make sense. But it is really hard to approach for someone like me.
    replies(1): >>45781525 #
    7. cogman10 ◴[] No.45776696{3}[source]
    > I think "practically rudderless" here is fairly misinformed and a little harmful/rude to all the folks doing tons of great work still.

    That great work is mostly opaque on the outside.

    What's been noticeable as an observer is that a lot of the well known names associated with rust no longer work on it and there's been a large amount of turnover around it.

    That manifests in things like this case where work was in progress up until ~2021 and then was ultimately backburnered while the entire org was reshuffled. (I'd note the dates on the MCP as Feb 2024).

    I can't tell exactly how much work or what direction it went in from 2021 to 2024 but it does look apparent that the work ultimately got shifted between multiple individuals.

    I hope rust is in a better spot. But I also don't think I was being unfair in pointing out how much momentum got wrecked when Mozilla pulled support.

    replies(1): >>45777211 #
    8. amluto ◴[] No.45777165[source]
    I’m not convinced that this can help in a meaningful way.

    Fundamentally, if you have two coroutines (or cooperatively scheduled threads or whatever), and one of them holds a lock, and the other one is awaiting the lock, and you don’t schedule the first one, you’re stuck.

    I wonder if there’s a form of structured concurrency that would help. If I create two futures and start both of them (in Rust this means polling each one once) but do not continue to poll both, then I’m sort of making a mistake.

    So imagine a world where, to poll a future at all, I need to have a nursery, and the nursery is passed in from my task and down the call stack. When I create a future, I can pass in my nursery, but that future then gets an exclusive reference to my future until it’s complete or cancelled. If I want to create more than one future that are live concurrently, I need to create a FutureGroup (that gets an exclusive reference to my nursery) and that allows me to create multiple sub-nurseries that can be used to make futures but cannot be used to poll them — instead I poll the FutureGroup.

    (I have yet to try using an async/await system or a reactor or anything of the sort that is not very easy to screw up. My current pet peeve is this pattern:

        data = await thingy.read()
    
    What if thingy.read() succeeds but I am cancelled? This gets nasty is most programming languages. Python: the docs on when I can get cancelled are almost nonexistent, and it’s not obviously possible to catch the CancelledError such that I still have data and can therefore save it somewhere so it’s not lost. Rust: what if thingy thinks it has returned the data but I’m never polled again? Maybe this can’t happen if I’m careful, but that requires more thought than I’m really happy with.)
    9. raggi ◴[] No.45777211{4}[source]
    The language team tends to look at these kinds of challenges and drive them to a root cause, which spins off a tree of work to adjust the core language to support what's required by the higher level pieces, once that work is done then the higher level projects are unblocked (example: RPIT for async drop).

    That's not always super visible if you're not following the working groups or in contact with folks working on the stuff. It's entirely fair that they're prioritizing getting work done than explaining low level language challenges to everyone everywhere.

    I think you're seeing a lack of data and trying to use that as a justification to fit a story that you like, more than seeing data that is derivative of the story that you like. Of course some people were horribly disrupted by the changes, but language usage also expanded substantially during and since that time, and there are many team members employed by many other organizations, and many independents too.

    And there are more docs, anyway:

    https://rust-lang.github.io/rust-project-goals/2024h2/async.... https://rust-lang.github.io/rust-project-goals/2025h1/async.... https://rust-lang.github.io/rust-project-goals/2025h2/field-... https://rust-lang.github.io/rust-project-goals/2025h2/evolvi... https://rust-lang.github.io/rust-project-goals/2025h2/goals....

    10. kibwen ◴[] No.45781483[source]
    While the Mozilla layoffs were a stressful time with a lot of uncertainty involved, in the end it hasn't appeared to have had a deleterious effect on Rust development. Today the activity in the Rust repo is as high as it's ever been (https://github.com/rust-lang/rust/graphs/contributors) and the governance of the project is more organized and healthy than it's ever been (https://blog.rust-lang.org/2025/10/15/announcing-the-new-rus...). The language certainly isn't rudderless, it's just branched out beyond the RFC system (https://blog.rust-lang.org/2025/10/28/project-goals-2025h2/). RFCs are still used for major things as a form of documentation, validation, and community alignment, but doing design up-front in RFCs has turned out to be an extremely difficult process. Instead, it's evolving toward a system where major things get implemented first as experiments, whose design later guides the eventual RFC.
    11. kibwen ◴[] No.45781525{4}[source]
    If you want to find out the status of something, the best bet is to go to the Rust Zulip and ask around: https://rust-lang.zulipchat.com/ . Most Rust initiatives are pushed forward by volunteers who are happy to talk about what they're working on, but who only periodically write status reports on tracking issues (usually in response to someone asking them what the status is). Rust isn't a company where documentation is anyone's job, it's just a bunch of people working on stuff, for better or worse.
    12. mechanical_berk ◴[] No.45786112[source]
    I agree. It seems like this bug arises because one Future is awaited while another is ignored. I have seen this sort of bug a lot.

    So maybe all that is needed is a lint that warns if you keep a Future (or a reference to one) across an await point? The Future you are awaiting wouldn't count of course. Is there some case where this doesn't work?