←back to thread

283 points ghuntley | 3 comments | | HN request time: 0.616s | source
1. kragen ◴[] No.45136753[source]
This is pretty great. I only learned to use perf_events to see annotated disassembly a few weeks ago, although I don't know how to interpret what I see there yet.

I suspect the slowness identified with mmap() here is somewhat fixable, for example by mapping already-in-RAM pages somewhat more eagerly. So it wouldn't be surprising to me (though see above for how much I'm not an expert) if next year mmap were faster than io_uring again.

replies(1): >>45141994 #
2. jared_hulbert ◴[] No.45141994[source]
The io_uring solution avoids this whole effort of mapping. It doesn't have to map the already-in-RAM pages at all. It reuses a small set of buffers. So there is a lot of random cache-miss prone work that mmap() has to do that the io_uring solution avoids. If mmap() does this in the background it would cache up with io_uring. I'd then have to get a couple more drives to get io_uring to catch up. With enough drives I'd bet they'd be closer than you think. I still think I could get the io_uring to be faster than the mmap() even if the count never faulted, mostly because the io_uring has a smaller TLB footprint and can fit in L3 cache. But it'd be tough.
replies(1): >>45143282 #
3. kragen ◴[] No.45143282[source]
I agree that io_uring is a fundamentally more efficient approach, but I think the performance limits you're currently measuring with mmap() aren't the fundamental ones imposed by the mmap() API, and I think that's what you're saying too?