Futurelock: A subtle risk in async Rust

(rfd.shared.oxide.computer)

435 points bcantrill | 2 comments | 31 Oct 25 16:49 UTC | HN request time: 0s | source

This RFD describes our distillation of a really gnarly issue that we hit in the Oxide control plane.[0] Not unlike our discovery of the async cancellation issue[1][2][3], this is larger than the issue itself -- and worse, the program that hits futurelock is correct from the programmer's point of view. Fortunately, the surface area here is smaller than that of async cancellation and the conditions required to hit it can be relatively easily mitigated. Still, this is a pretty deep issue -- and something that took some very seasoned Rust hands quite a while to find.

[0] https://github.com/oxidecomputer/omicron/issues/9259

[1] https://rfd.shared.oxide.computer/rfd/397

[2] https://rfd.shared.oxide.computer/rfd/400

[3] https://www.youtube.com/watch?v=zrv5Cy1R7r4

Show context

levodelellis ◴[31 Oct 25 23:49 UTC] No.45777969[source]▶

>>45774086 (OP) #

In October alone I seen 5+ articles and comments about multi-threading and I don't know why

I always said if your code locks or use atomics, it's wrong. Everyone says I'm wrong but you get things like what's described in the article. I'd like to recommend a solution but there's pretty much no reasonable way to implement multi-threading when you're not an expert. I heard Erlang and Elixir are good but I haven't tried them so I can't really comment

replies(3): >>45777993 #>>45778558 #>>45779598 #

umvi ◴[31 Oct 25 23:52 UTC] No.45777993[source]▶

>>45777969 #

> I always said if your code locks or use atomics, it's wrong. Everyone says I'm wrong but you get things like what's described in the article.

Ok so say you are simulating high energy photons (x-rays) flowing through a 3d patient volume. You need to simulate 2 billion particles propagating through the patient in order to get an accurate estimation of how the radiation is distributed. How do you accomplish this without locks or atomics without the simulation taking 100 hours to run? Obviously it would take forever to simulate 1 particle at a time, but without locks or atomics the particles will step on each others' toes when updating radiation distribution in the patient. I suppose you could have 2 billion copies of the patient's volume in memory and each particle gets its own private copy and then you merge them all at the end...

replies(1): >>45778046 #

levodelellis ◴[01 Nov 25 00:02 UTC] No.45778046[source]▶

>>45777993 #

From my understanding this talk describes how he implemented a solution for a similar problem https://www.youtube.com/watch?v=Kvsvd67XUKw

I'm saying if you're not writing multi-threaded code everyday, use a library. It can use atomics/locks but you shouldn't use it directly. If the library is designed well it'd be impossible to deadlock.

replies(1): >>45789034 #

jstimpfle ◴[02 Nov 25 09:33 UTC] No.45789034[source]▶

>>45778046 #

If you take programming serious, learn it.

With a library that encapsulates a low number of patterns (like message passing) you'll be very limited. If you never start learning about lower level multi-threading issues you'll never learn it. And it's not _that_ hard.

I'm not writing multi threaded every day (by far), but often enough that I can write useful things (using shared memory, atomics, mutexes, condition variables, etc). And I'm looking forward to learn more, better understand various issues, learn new patterns.

replies(1): >>45791701 #

1. levodelellis ◴[02 Nov 25 17:00 UTC] No.45791701[source]▶

>>45789034 #

I do write code that uses multi-threading every day and usually I write a few lockless functions every month for the in-house library I use. I was considering writing an article on atomics after all the mistakes and bad code I've seen.

A problem with writing an article is that if I don't show real code, people might think I'm exaggerating; if I do show real code, it'd look like I'm calling someone a bad programmer

replies(1): >>45791962 #

2. jstimpfle ◴[02 Nov 25 17:35 UTC] No.45791962[source]▶

>>45791701 (TP) #

I've certainly seen my share of buggy multi-threaded programming. On the other hand, that's nothing compared to all the buggy and bad code I've seen overall. And I don't think it's going to get better by telling people not to even try.

I'm very doubtful that multi-threading can be abstracted behind a library. Simple message passing can cover a lot of use cases but not everything by far. I've also seen the Javascript model work fine, but it's not real multi-threading (no parallelism). As to async frameworks, they too are restrictive and they come with a lot of complexity.

↑