←back to thread

173 points daviducolo | 3 comments | | HN request time: 0.003s | source
1. gwbas1c ◴[] No.43335647[source]
I'm curious why krep runs faster with large files in a multithreaded manner?

Naively, isn't IO the bottleneck?

IE, I'd think that loading a file would be slow enough that krep would be IO-bound?

Do you have a typical ratio of IO time to search time on a modern disk and CPU?

What about a producer-consumer model where one thread reads files and creates an in-memory queue of file contents; and a different thread handles the actual searching without pauses for IO?

Edit: If you're truly CPU-bound, another variation of producer-consumer is to have a single thread read files into queues, and then multiple threads searching through files. Each thread would search through a single file at a time. This eliminates the shared memory issue that you allude to with overlap.

replies(1): >>43337888 #
2. lainzhow ◴[] No.43337888[source]
I didn't read the source, but from the description it says it uses memory mapping. So my guess here is that IO isn't so much of an issue since prefetching can hide away the latency if you are able to memory map a large enough segment of the file.

Iff the statement about prefetching is true though, I wonder how the prefetching wouldn't be bamboozled by the multiple threads accessing the file.

replies(1): >>43338578 #
3. gwbas1c ◴[] No.43338578[source]
Forgot about memory mapping.

In that case it probably makes more sense to have a shared queue of files, and each thread handles a single file at a time. It'll avoid the overlap issue.