←back to thread

283 points ghuntley | 1 comments | | HN request time: 0s | source
Show context
ayende ◴[] No.45135399[source]
This is wrong, because your mmap code is being stalled for page faults (including soft page faults that you have when the data is in memory, but not mapped to your process).

The io_uring code looks like it is doing all the fetch work in the background (with 6 threads), then just handing the completed buffers to the counter.

Do the same with 6 threads that would first read the first byte on each page and then hand that page section to the counter, you'll find similar performance.

And you can use both madvice / huge pages to control the mmap behavior

replies(4): >>45135629 #>>45138707 #>>45140052 #>>45147766 #
lucketone ◴[] No.45135629[source]
It would seem you summarised whole post.

That’s the point: “mmap” is slow because it is serial.

replies(1): >>45136283 #
arghwhat ◴[] No.45136283[source]
mmap isn't "serial", the code that was using the mapping was "serial". The kernel will happily fill different portions of the mapping in parallel if you have multiple threads fault on different pages.

(That doesn't undermine that io_uring and disk access can be fast, but it's comparing a lazy implementation using approach A with a quite optimized one using approach B, which does not make sense.)

replies(4): >>45136633 #>>45136749 #>>45136761 #>>45136970 #
1. ◴[] No.45136761{3}[source]