And it's all for, what? A little memory for thread stacks (most of which ends up being a wash because of all the async contexts being tossed around anyway -- those are still stacks and still big!)? Some top-end performance for people chasing C10k numbers in a world that has scaled into datacenters for a decade anyway?
Not worth it. IMHO it's time to put this to bed.
[1] No one in that thread or post has a good summary, but it's "Rust futures consume wakeup events from fair locks that only emit one event, so can deadlock if they aren't currently being selected and will end up waiting for some other event before doing so."