Io_uring, kTLS and Rust for zero syscall HTTPS server

(blog.habets.se)

Show context

Seattle3503 ◴[22 Aug 25 05:44 UTC] No.44981374[source]▶

> For example when submitting a write operation, the memory location of those bytes must not be deallocated or overwritten.

> The io-uring crate doesn’t help much with this. The API doesn’t allow the borrow checker to protect you at compile time, and I don’t see it doing any runtime checks either.

I've seen comments like this before[1], and I get the impression that building a a safe async Rust library around io_uring is actually quite difficult. Which is sort of a bummer.

IIRC Alice from the tokio team also suggested there hasn't been much interest in pushing through these difficulties more recently, as the current performance is "good enough".

[1] https://boats.gitlab.io/blog/post/io-uring/

replies(7): >>44981390 #>>44981469 #>>44981966 #>>44982846 #>>44983850 #>>44983930 #>>44989979 #

newpavlov ◴[22 Aug 25 10:32 UTC] No.44982846[source]▶

>>44981374 #

This actually one of my many gripes about Rust async and why I consider it a bad addition to the language in the long term. The fundamental problem is that rust async was developed when epoll was dominant (and almost no one in the Rust circles cared about IOCP) and it has heavily influenced the async design (sometimes indirectly through other languages).

Think about it for a second. Why do we not have this problem with "synchronous" syscalls? When you call `read` you also "pass mutable borrow" of the buffer to the kernel, but it maps well into the Rust ownership/borrow model since the syscall blocks execution of the thread and there are no ways to prevent it in user code. With poll-based async model you side-step this issues since you use the same "sync" syscalls, but which are guaranteed to return without blocking.

For a completion-based IO to work properly with the ownership/borrow model we have to guarantee that the task code will not continue execution until it receives a completion event. You simply can not do it with state machines polled in user code. But the threading model fits here perfectly! If we are to replace threads with "green" threads, user Rust code will look indistinguishable from "synchronous" code. And no, the green threads model can work properly on embedded systems as demonstrated by many RTOSes.

There are several ways of how we could've done it without making the async runtime mandatory for all targets (the main reason why green threads were removed from Rust 1.0). My personal favorite is introduction of separate "async" targets.

Unfortunately, the Rust language developers made a bet on the unproved polling stackless model because of the promised efficiency and we are in the process of finding out whether the bet plays of or not.

replies(3): >>44983562 #>>44984589 #>>44984882 #

kibwen ◴[22 Aug 25 12:13 UTC] No.44983562[source]▶

>>44982846 #

> The fundamental problem is that rust async was developed when epoll was dominant (and almost no one in the Rust circles cared about IOCP)

No, this is a mistaken retelling of history. The Rust developers were not ignorant of IOCP, nor were they zealous about any specific async model. They went looking for a model that fit with Rust's ethos, and completion didn't fit. Aaron Turon has an illuminating post from 2016 explaining their reasoning: https://aturon.github.io/tech/2016/09/07/futures-design/

See the section "Defining futures":

There’s a very standard way to describe futures, which we found in every existing futures implementation we inspected: as a function that subscribes a callback for notification that the future is complete.

Note: In the async I/O world, this kind of interface is sometimes referred to as completion-based, because events are signaled on completion of operations; Windows’s IOCP is based on this model.

[...] Unfortunately, this approach nevertheless forces allocation at almost every point of future composition, and often imposes dynamic dispatch, despite our best efforts to avoid such overhead.

[...] TL;DR, we were unable to make the “standard” future abstraction provide zero-cost composition of futures, and we know of no “standard” implementation that does so.

[...] After much soul-searching, we arrived at a new “demand-driven” definition of futures.

I'm not sure where this meme came from where people seem to think that the Rust devs rejected a completion-based scheme because of some emotional affinity for epoll. They spent a long time thinking about the problem, and came up with a solution that worked best for Rust's goals. The existence of a usable io_uring in 2016 wouldn't have changed the fundamental calculus.

replies(2): >>44983784 #>>44992207 #

newpavlov ◴[22 Aug 25 12:34 UTC] No.44983784[source]▶

>>44983562 #

>which we found in every existing futures implementation we inspected

This is exactly what I meant when I wrote about the indirect influence from other languages. People may dress it up as much as they want, but it's clear that polling was the most important model at the time (outside of the Windows world) and a lot of design consideration was put into being compatible with it. The Rust async model literally uses the polling terminology in its most fundamental interfaces!

>this approach nevertheless forces allocation at almost every point of future composition

This is only true in the narrow world of modeling async execution with futures. Do you see heap allocations in Go on each equivalent of "future composition" (i.e. every function call)? No, you do not. With the stackfull models you allocate a full stack for your task and you model function calls as plain function calls without any future composition shenaniganry.

Yes, the stackless model is more efficient memory-wise and allows for some additional useful tricks (like sharing future stacks in `join!`). But the stackfull model is perfectly efficient for 95+% of use cases, fits better with the borrow/ownership model, does not result in the `.await` noise, does not lead to the horrible ecosystem split (including split between different executors), and does not need the language-breaking hacks like `Pin` (see the `noalias` exception made for it). And I believe it's possible to close the memory efficiency gap between the models with certain compiler improvements (tracking maximum stack usage bound for functions and introducing a separate async ABI with two separate stacks).

>The existence of a usable io_uring in 2016 wouldn't have changed the fundamental calculus.

IIRC the first usable versions of io-uring very released approximately during the time when the Rust async was undergoing stabilization. I am really confident that if the async system was designed today we would've had a totally different model. Importance of completion-based models has only grown since then not only because of the sane async file IO, but also because of Spectre and Meltdown.

replies(1): >>44984545 #

1. kibwen ◴[22 Aug 25 13:37 UTC] No.44984545[source]▶

>>44983784 #

> But the stackfull model is

The existence of advantages doesn't change anything here. The problems is that the disadvantages made this approach a non-starter, despite a lot of effort to make it work. Tradeoffs exist in language design, and the approaches were judged accordingly. What works for Go doesn't necessarily work for Rust, because they target different domains.

> I am really confident that if the async system was designed today we would've had a totally different model

No, without solving the original problems, the outcome would be the same. The Rust devs at the time were well aware of io_uring.

replies(1): >>44985952 #

2. no_wizard ◴[22 Aug 25 15:42 UTC] No.44985952[source]▶

>>44984545 (TP) #

What were the original problems exactly? From what I recall they effectively boiled down to size concerns due to seeing themselves as a c/c++ successor and they didn’t want to lose any adoption in the embedded systems target audience.

replies(2): >>44986171 #>>44986648 #

3. const_cast ◴[22 Aug 25 16:00 UTC] No.44986171[source]▶

>>44985952 #

I mean from an outsiders perspective on Rust this is how I saw it.

Rust is in a strange place because they're a systems language directly competing with C++. Async, in general, doesn't vibe with that but green threads definitely don't.

If you're gonna do green threads you might as well throw in a GC too and get a whole runtime. And now you're writing Go.

replies(2): >>44986288 #>>44986357 #

4. zozbot234 ◴[22 Aug 25 16:08 UTC] No.44986288{3}[source]▶

>>44986171 #

On the contrary, stackless async can "vibe" quite well with deep embedded workloads that also require a low-level language like C/C++. There's very few meaningful alternatives to Rust in that space.

5. no_wizard ◴[22 Aug 25 16:15 UTC] No.44986357{3}[source]▶

>>44986171 #

I don't think doing green threads equates to 'well might as well have a GC now!'. I think they made the wrong tradeoff too, because hardware will inevitably catch up to the language requirements, especially if its desirable to use. Not to mention over time things can be made more efficient from the Rust side as well, with compiler improvements, better programming techniques etc.

I think they made the wrong bet, personally. Having worked in enough languages that have function coloring problems I would avoid it as a language design as a line in the sand item, regardless of tradeoffs

replies(2): >>44990669 #>>44992857 #

6. kibwen ◴[22 Aug 25 16:40 UTC] No.44986648[source]▶

>>44985952 #

Have you read the article by Aaron Turon linked above? It's very informative, and if you have any questions about specific parts of it, feel free to reference them. In particular it boils down to the fact that Rust bends over backwards to avoid putting anything that requires allocation or dynamic dispatch in the core language (e.g. Rust's closures are fascinating in that they're stack-allocated, like C++'s, while also playing nicely with the borrow checker, which is quite a feat). This property extends to the current design of async, which makes async suitable for embedded devices, which is extremely cool (check out the Embassy project for the state of the art in this space).

7. surajrmal ◴[22 Aug 25 22:28 UTC] No.44990669{4}[source]▶

>>44986357 #

There are other languages with green threads and folks are free to use those. Zig is trying to do interesting things with stackful coroutines.

I don't think I nor most systems programmers would have chosen rust if it required green threads instead of stackless coroutines for async. If you work on embedded or low level environments like kernels and whatnot, you need something that falls back to callbacks for async. I'm sure folks who work on servers would have been fine with green threads but they were not the target audience for rust. Being upset because you fall outside the target demographic of a particular language doesn't mean they made the wrong choice. It just means you should look for something else.

8. fpoling ◴[23 Aug 25 03:26 UTC] No.44992857{4}[source]▶

>>44986357 #

Hardware does not catches up with language requirements. If anything, it is languages/compilers that catch up with hardware, like SSE instructions and loop parallel ism.

For me the mistake that Rust made was that it tried too hard to behave like C/C++ with its single execution stack.

Ada uses two stacks allowing a callee to return a stack-allocated arrays to the caller. Not only it allows to avoid dynamic allocations in many cases where C++ allocates memory, but it also reduces the need for pointers making the code safer even without the borrow checker.

If instead of async Rust spent efforts on implementing something like that or even allow for explicit stack control from safe code so green threads or co-routines could be implemented as a library it could be more compatible with io_uring world.

replies(1): >>44993588 #

9. zozbot234 ◴[23 Aug 25 05:57 UTC] No.44993588{5}[source]▶

>>44992857 #

> Ada uses two stacks allowing a callee to return a stack-allocated arrays to the caller.

You could do this manually by threading a pointer to a separately-allocated stack (could be on the heap or perhaps just a static allocation) as an extra function parameter. It's just a very simple case of arena allocation, with similar advantages and disadvantages. (For example, the caller must ensure that enough space is available on the dynamic-data stack for anything that the callee might want to push onto it.) In general it's just not really worth it, because it turns out that dynamically-sized data that one would not want to simply place on the heap is rare anyway.

↑