←back to thread

283 points ghuntley | 4 comments | | HN request time: 0.683s | source
1. titanomachy ◴[] No.45133648[source]
Very interesting article, thanks for publishing these tests!

Is the manual loop unrolling really necessary to get vectorized machine code? I would have guessed that the highest optimization levels in LLVM would be able to figure it out from the basic code. That's a very uneducated guess, though.

Also, curious if you tried using the MAP_POPULATE option with mmap. Could that improve the bandwidth of the naive in-memory solution?

> humanity doesn't have the silicon fabs or the power plants to support this for every moron vibe coder out there making an app.

lol. I bet if someone took the time to make a high-quality well-documented fast-IO library based on your io_uring solution, it would get use.

replies(1): >>45134258 #
2. jared_hulbert ◴[] No.45134258[source]
YES! gcc and clang don't like to optimize this. But they do if you hardcode the size_bytes to an aligned value. It kind of makes sense, what if a user passes size_bytes as 3? With enough effort the compilers could handle this, but it's a lot to ask.

I just ran MAP_POPULATE the results are interesting.

It speeds up the counting loop. Same speed or higher as the my read() to a malloced buffer tests.

HOWEVER... It takes a longer time overall to do the population of the buffer. The end result is it's 2.5 seconds slower to run the full test when compared to the original. I did not guess that one correctly.

time ./count_10_unrolled ./mnt/datafile.bin 53687091200 unrolled loop found 167802249 10s processed at 5.39 GB/s ./count_10_unrolled ./mnt/datafile.bin 53687091200 5.58s user 6.39s system 99% cpu 11.972 total time ./count_10_populate ./mnt/datafile.bin 53687091200 unrolled loop found 167802249 10s processed at 8.99 GB/s ./count_10_populate ./mnt/datafile.bin 53687091200 5.56s user 8.99s system 99% cpu 14.551 total

replies(2): >>45134454 #>>45135200 #
3. mischief6 ◴[] No.45134454[source]
it could be interesting to see what ispc does with similar code.
4. titanomachy ◴[] No.45135200[source]
Hmm, I expected some slowdown from POPULATE, but I thought it would still come out ahead. Interesting!