Show HN: Krep a High-Performance String Search Utility Written in C

(davidesantangelo.github.io)

173 points daviducolo | 2 comments | 11 Mar 25 16:12 UTC | HN request time: 0.426s | source

Show context

kazinator ◴[11 Mar 25 18:16 UTC] No.43335440[source]▶

Where are the test cases?

E.g. the chunk boundary stuff in the multi-threaded file search is something that would make me nervous.

It brings new edge cases into a simple search, and those edge cases are not directly related to features in the data; just to the happenstance of where the chunk boundaries land.

Just by adding one character to a file, a character that obviously lies far outside of any match, we shift the boundaries such that a match could break megabytes away from the insertion.

replies(1): >>43336635 #

daviducolo ◴[11 Mar 25 20:24 UTC] No.43336635[source]▶

>>43335440 #

added https://github.com/davidesantangelo/krep/tree/main/test

replies(1): >>43337177 #

1. scottlamb ◴[11 Mar 25 21:26 UTC] No.43337177[source]▶

>>43336635 #

Do those exercise the logic kazinator called out as test-worthy? To my eye, no. They don't use search_file, and their inputs are smaller than MIN_FILE_SIZE_FOR_THREADS anyway.

I'm inclined to agree with kazinator. The code here: <https://github.com/davidesantangelo/krep/blob/ac6783af42c92f...> looks wrong to me. It potentially increases `chunk_size` but doesn't reduce the number of loop iterations to be consistent with that. Maybe search_thread recognizes the boundary violation and does nothing? That'd be a relatively harmless outcome, but it's strange to launch threads that do nothing. But actually it's not immediately obvious to me that it does recognize if end_pos is beyond file_len. And then the code about handling + skipping overlaps in search_thread also looks test-worthy.

replies(1): >>43338619 #

2. kazinator ◴[12 Mar 25 00:20 UTC] No.43338619[source]▶

>>43337177 (TP) #

The priority is probably to cover the search algorithms themselves first. These are the bits that are likely to be reused in other programs.

↑