Io_uring, kTLS and Rust for zero syscall HTTPS server

(blog.habets.se)

495 points guntars | 1 comments | 22 Aug 25 03:51 UTC | HN request time: 0s | source

Show context

Seattle3503 ◴[22 Aug 25 05:44 UTC] No.44981374[source]▶

> For example when submitting a write operation, the memory location of those bytes must not be deallocated or overwritten.

> The io-uring crate doesn’t help much with this. The API doesn’t allow the borrow checker to protect you at compile time, and I don’t see it doing any runtime checks either.

I've seen comments like this before[1], and I get the impression that building a a safe async Rust library around io_uring is actually quite difficult. Which is sort of a bummer.

IIRC Alice from the tokio team also suggested there hasn't been much interest in pushing through these difficulties more recently, as the current performance is "good enough".

[1] https://boats.gitlab.io/blog/post/io-uring/

replies(7): >>44981390 #>>44981469 #>>44981966 #>>44982846 #>>44983850 #>>44983930 #>>44989979 #

newpavlov ◴[22 Aug 25 10:32 UTC] No.44982846[source]▶

>>44981374 #

This actually one of my many gripes about Rust async and why I consider it a bad addition to the language in the long term. The fundamental problem is that rust async was developed when epoll was dominant (and almost no one in the Rust circles cared about IOCP) and it has heavily influenced the async design (sometimes indirectly through other languages).

Think about it for a second. Why do we not have this problem with "synchronous" syscalls? When you call `read` you also "pass mutable borrow" of the buffer to the kernel, but it maps well into the Rust ownership/borrow model since the syscall blocks execution of the thread and there are no ways to prevent it in user code. With poll-based async model you side-step this issues since you use the same "sync" syscalls, but which are guaranteed to return without blocking.

For a completion-based IO to work properly with the ownership/borrow model we have to guarantee that the task code will not continue execution until it receives a completion event. You simply can not do it with state machines polled in user code. But the threading model fits here perfectly! If we are to replace threads with "green" threads, user Rust code will look indistinguishable from "synchronous" code. And no, the green threads model can work properly on embedded systems as demonstrated by many RTOSes.

There are several ways of how we could've done it without making the async runtime mandatory for all targets (the main reason why green threads were removed from Rust 1.0). My personal favorite is introduction of separate "async" targets.

Unfortunately, the Rust language developers made a bet on the unproved polling stackless model because of the promised efficiency and we are in the process of finding out whether the bet plays of or not.

replies(3): >>44983562 #>>44984589 #>>44984882 #

duped ◴[22 Aug 25 13:41 UTC] No.44984589[source]▶

>>44982846 #

> You simply can not do it with state machines polled in user code

That's not really true. The only guarantees in Rust futures are that they are polled() once and must have their Waker's wake() called before they are polled again. A completion based future submits the request on first poll and calls wake() on completion. That's kind of the interesting design of futures in Rust - they support polling and completion.

The real conundrum is that the futures are not really portable across executors. For io_using for example, the executor's event loop is tightly coupled with submission and completion. And due to instability of a few features (async trait, return impl trait in trait, etc) there is not really a standard way to write executor independent async code (you can, some big crates do, but it's not necessarily trivial).

Combine that with the fact that container runtimes disable io_uring by default and most people are deploying async web servers in Docker containers, it's easy to see why development has stalled.

It's also unfair to mischaracterize design goals and ideas from 2016 with how the ecosystem evolved over the last decade, particularly after futures were stabilized before other language items and major executors became popular. If you look at the RFCs and blog posts back then (eg: https://aturon.github.io/tech/2016/09/07/futures-design/) you can see why readiness was chosen over completion, and how completion can be represented with readiness. He even calls out how naïve completion (callbacks) leads to more allocation on future composition and points to where green threads were abandoned.

replies(3): >>44984765 #>>44988043 #>>44988514 #

newpavlov ◴[22 Aug 25 13:57 UTC] No.44984765[source]▶

>>44984589 #

No, the fundamental problem (in the context of io-uring) is that futures are managed by user code and can be dropped at any time. This often referred as "cancellation safety". Imagine a future has initialized completion-based IO with buffer which is part of the future state. User code can simply drop the future (e.g. if it was part of `select!`) and now we have a huge problem on our hands: the kernel will write into a dropped buffer! In the synchronous context it's equivalent to de-allocating thread stack under foot of the thread which is blocked on a synchronous syscall. You obviously can do it (using safe code) in thread-based code, but it's fine to do in async.

This is why you have to use various hacks when using io-uring based executors with Rust async (like using polling mode or ring-owned buffers and additional data copies). It could be "resolved" on the language level with an additional pile of hacks which would implement async Drop, but, in my opinion, it would only further hurt consistency of the language.

>He even calls out how naïve completion (callbacks) leads to more allocation on future composition and points to where green threads were abandoned.

I already addressed it in the other comment.

replies(2): >>44984811 #>>44985652 #

vlovich123 ◴[22 Aug 25 15:15 UTC] No.44985652[source]▶

>>44984765 #

I really don’t understand this argument. If you force the user to transfer ownership of the buffer into the I/O subsystem, the system can make sure to transfer ownership of the buffer into the async runtime, not leaving it held within the cancellable future and the future returns that buffer which is given back when the completion is received from the kernel. What am I missing?

replies(2): >>44985725 #>>44987286 #

newpavlov ◴[22 Aug 25 15:22 UTC] No.44985725[source]▶

>>44985652 #

The goal of the async system is to allow users to write synchronous looking code which is executed asynchronously with all associated benefits. "Forcing" users to do stuff like this shows the clear failure to achieve this goal. Additionally, passing ownership like this (instead of passing mutable borrow) arguably goes against the zero-cost principle.

replies(1): >>44985850 #

vlovich123 ◴[22 Aug 25 15:33 UTC] No.44985850[source]▶

>>44985725 #

I don’t follow the zero copy argument. You pass in an owned buffer and get an owned buffer back out. There’s no copying happening here. It’s your claim that async is supposed to look like synchronous code but I don’t buy it. I don’t see why that’s a goal. Synchronous is an anachronistic software paradigm for a computer hardware architecture that never really existed (electronics are concurrent and asynchronous by nature) and cause a lot of performance problems trying to make it work that way.

Indeed, one thing I’ve always wondered is if you can submit a read request for a page aligned buffer and have the kernel arrange for data to be written directly into that without any additional copies. That’s probably not possible since there’s routing happening in the kernel and it accumulates everything into sk_buffs.

But maybe it could arrange for the framing part of the packet and the data to be decoupled so that it can just give you a mapping into the data region (maybe instead of you even providing a buffer, it gives you back an address mapped into your space). Not sure if that TLB update might be more expensive than a single copy.

replies(2): >>44986038 #>>44986257 #

newpavlov ◴[22 Aug 25 15:49 UTC] No.44986038[source]▶

>>44985850 #

You have an inevitable overhead of managing the owned buffer when compared against simply passing mutable borrow to an already existing buffer. Imagine if `io::Read` APIs were constructed as `fn read(&mut self, buf: Vec<u8>) -> io::Resul<Vec<u8>>`.

Parity with synchronous programming is an explicit goal of Rust async declared many times (e.g. see here https://github.com/rust-lang/rust-project-goals/issues/105). I agree with your rant about the illusion of synchronicity, but it does not matter. The synchronous abstraction is immensely useful in practice and less leaky it is, the better.

replies(2): >>44986309 #>>44987910 #

namibj ◴[22 Aug 25 16:10 UTC] No.44986309{3}[source]▶

>>44986038 #

The problem is that the ring requires suitably locked memory which won't be subject to swapping, thus inherently forcing a different memory type if you want the extra-low-overhead/extra-scalable operation.

It makes sense to ask the ring wrapper for memory that you can emplace your payload into before submitting the IO if you want to use zero-copy.

replies(1): >>44987251 #

1. surajrmal ◴[22 Aug 25 17:30 UTC] No.44987251{4}[source]▶

>>44986309 #

newpavlov seems to operate in a theoretical bubble. Until something like he describes is published publicly where we talk more concretely, it's not worth engaging. Requiring special buffers is very typical for any hardware offload with zero copy. It isn't a leaky abstraction. Async rust is well designed and I've not seen anything rival it. If there are problems it's in the libraries built on top like tokio.

↑