One caveat though - using a normal std Mutex within an async environment is an antipattern and should not be done - you can cause all sorts of issues & I believe even deadlock your entire code. You should be using tokio sync primitives (e.g. tokio Mutex) which can yield to the reactor when it needs to block. Otherwise the thread that's running the future blocks forever waiting for that mutex and that reactor never does anything else which isn't how tokio is designed).
So the compiler is warning about 1 problem, but you also have to know to be careful to know not to call blocking functions in an async function.
This is simply not true, and the tokio documentation says as much:
"Contrary to popular belief, it is ok and often preferred to use the ordinary Mutex from the standard library in asynchronous code."
https://docs.rs/tokio/latest/tokio/sync/struct.Mutex.html#wh...
Rust can use that type information and lifetimes to figure out when it's safe and when not.
A type is “Send” if it can be moved from one thread to another, it is “Sync” if it can be simultaneously accessed from multiple threads.
These traits are automatically applied whenever the compiler knows it is safe to do so. In cases where automatic application is not possible, the developer can explicitly declare a type to have these traits, but doing so is unsafe (requires the ‘unsafe’ keyword and everything that entails).
You can read more at rustinomicon, if you are interested: https://doc.rust-lang.org/nomicon/send-and-sync.html
True. I used std::mutex with tokio and after a few days my API would not respond unless I restarted the container. I was under the impression that if it compiles, it's gonna just work (fearless concurrency) which is usually the case.
https://docs.rs/tokio/latest/tokio/task/fn.spawn.html
If you want to run everything on the same thread then localset enables that. See how the spawn function does not include the send bound.
https://docs.rs/tokio/latest/tokio/task/struct.LocalSet.html
The compiler knows the Future doesn't implement the Send trait because MutexGuard is not Send and it crosses await points.
Then, tokio the aysnc runtime requires that futures that it runs are Send because it can move them to another thread.
This is how Rust safety works. The internals of std, tokio and other low level libraries are unsafe but they expose interfaces that are impossible to misuse.
- The return type of Mutex::lock() is a MutexGuard, which is a smart pointer type that 1) implements Deref so it can be dereferenced to access the underlying data, 2) implements Drop to unlock the mutex when the guard goes out of scope, and 3) implements !Send so the compiler knows it is unsafe to send between threads: https://doc.rust-lang.org/std/sync/struct.MutexGuard.html
- Rust's implementation of async/await works by transforming an async function into a state machine object implementing the Future trait. The compiler generates an enum that stores the current state of the state machine and all the local variables that need to live across yield points, with a poll function that (synchronously) advances the coroutine to the next yield point: https://doc.rust-lang.org/std/future/trait.Future.html
- In Rust, a composite type like a struct or enum automatically implements Send if all of its members implement Send.
- An async runtime that can move tasks between threads requires task futures to implement Send.
So, in the example here: because the author held a lock across an await point, the compiler must store the MutexGuard smart pointer as a field of the Future state machine object. Since MutexGuard is !Send, the future also is !Send, which means it cannot be used with an async runtime that moves tasks between threads.
If the author releases the lock (i.e. drops the lock guard) before awaiting, then the guard does not live across yield points and thus does not need to be persisted as part of the state machine object -- it will be created and destroyed entirely within the span of one call to Future::poll(). Thus, the future object can be Send, meaning the task can be migrated between threads.
> The exact behavior on locking a mutex in the thread which already holds the lock is left unspecified. However, this function will not return on the second call (it might panic or deadlock, for example).
Or your server is heavily contended enough that all worker threads are blocked on this mutex and no reactor can make forward progress.
there are absolutely situations where tokio's mutex and rwlock are useful, but the vast majority of the time you shouldn't need them
let guard = mutex.lock().await;
// guard.data is Option<T>, Some to begin with
let data = guard.data.take(); // guard.data is now None
let new_data = process_data(data).await;
guard.data = Some(new_data); // guard.data is Some again
Then you could cancel the future at the await point in between while the lock is held, and as a result guard.data will not be restored to Some.In the Rust community, cancellation is pretty well-established nomenclature for this.
Hopefully the video of my talk will be up soon after RustConf, and I'll make a text version of it as well for people that prefer reading to watching.
let data = mutex.lock().take();
let new_data = process_data(data).await;
*mutex.lock() = Some(new_data);
Here you are using a traditional lock and a cancellation at process_data results in the lock with the undesired state you're worried about. It's a general footgun of cancellation and asynchronous tasks that at every await boundary your data has to be in some kind of valid internally consistent state because the await may never return. To fix this more robustly you'd need the async drop language feature.Tokio MutexGuards are Send, unfortunately, so they are really prone to cancellation bugs.
(There's a related discussion about panic-based cancellations and mutex poisoning, which std's mutex has but Tokio's doesn't either.)
[1] spawn_local does exist, though I guess most people don't use it.
The generally recommended alternative is message passing/channels/"actor model" where there's a single owner of data which ensures cancellation doesn't occur -- or, at least that if cancellation happens the corresponding invalid state is torn down as well. But that has its own pitfalls, such as starvation.
This is all very unsatisfying, unfortunately.
If anything that's a disadvantage. You want your health monitoring to be the canary, not something that keeps on trucking even if the system is no longer doing useful work. (See the classic safety critical software fail of 'I need a watchdog... I'll just feed it regularly in an isolated task')
You can definitely argue that developers should think about await points the same way they think about letting go of the mutex entirely, in case cancellation happens. Are mutexes conducive to that kind of thinking? Practically, I've found this to be very easy to get wrong.
/healthz
/very_common_operation
/may_deadlock_server
Normally, /may_deadlock_server doesn't get enough traffic to cause problems (let's say it's 10 RPS and 1000 RPS is /very_common_operation and the server operates fine). However, a sudden influx of requests to /may_deadlock_server may cause your service to deadlock (and not a lot, let's say on the order of a few hundred requests). Do you still want the server to lock up completely and forever and wait for a healthz timeout to reboot the service? What if healthz still remains fine but the entire service goes from 10ms response times for requests to 200ms, just enough to cause problems but not enough to make healthz actually unavailable? And all this just because /may_deadlock saw a spike in traffic. And also, the failing healthz check just restarts your service but it won't mitigate the traffic spike if it's sustained. Now consider also that /may_deadlock_server is a trivial gadget for an attacker to DOS your site.Or do you want the web server responding healthily & rely on metrics and alerts to let you know that /may_deadlock_server is taking a long time to handle requests / impacting performance? Your health monitoring is an absolute last step for automatically mitigating an issue but it'll only help if the bug is some state stuck in a transient state - if it'll restart into the same conditions leading to the starvation then you're just going to be in an infinite reboot loop which is worse.
Healthz is not an alternative to metrics and alerting - it's a last stopgap measure to try to automatically get out of a bad situation. But it can also cause a worse problem if the situation is outside of the state of the service - so generally you want the service to remain available if a reboot wouldn't fix the problem.