io_uring is faster than mmap

(www.bitflux.ai)

283 points ghuntley | 3 comments | 04 Sep 25 22:01 UTC | HN request time: 0s | source

Show context

jared_hulbert ◴[04 Sep 25 23:18 UTC] No.45133330[source]▶

>>45132710 (OP) #

Cool. Original author here. AMA.

replies(5): >>45133433 #>>45133597 #>>45133666 #>>45133764 #>>45135337 #

1. whizzter ◴[05 Sep 25 05:47 UTC] No.45135337[source]▶

>>45133330 #

Like people mention, hugetlb,etc could be an improvement, but the core issue holding it it down probably has to do with mmap, 4k pages and paging behaviours, mmap will cause faults for each "small" 4k page not in memory, causing a kernel jump and then whatever machinery to fill in the page-cache (and bring up data from disk with the associated latency).

This in contrast with the io_uring worker method where you keep the thread busy by submitting requests and letting the kernel do the work without expensive crossings.

The 2g fully in-mem shows the CPU's real perf, the dip to 50gb is interesting, perhaps when going over 50% memory the Linux kernel evicts pages or something similar that is hurting perf, maybe plot a graph of perf vs test-size to see if there is an obvious cliff.

replies(2): >>45141276 #>>45142133 #

2. pianom4n ◴[05 Sep 25 17:38 UTC] No.45141276[source]▶

>>45135337 (TP) #

The in-memory solution creates a 2nd copy of the data so 50GB doesn't fit in memory anymore. The kernel is forced to drop and then reload part of the cached file.

3. jared_hulbert ◴[05 Sep 25 18:45 UTC] No.45142133[source]▶

>>45135337 (TP) #

When I run the 50GB in-mem setup I still have 40GB+ of free memory, I drop the page cache before I run "sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'" there wouldn't really be anything to evict from page cache and swap isn't changing.

I think I'm crossing the numa boundary which means some percentage of the accesses are higher latency.

↑