←back to thread

173 points daviducolo | 2 comments | | HN request time: 0.471s | source
Show context
kazinator ◴[] No.43335440[source]
Where are the test cases?

E.g. the chunk boundary stuff in the multi-threaded file search is something that would make me nervous.

It brings new edge cases into a simple search, and those edge cases are not directly related to features in the data; just to the happenstance of where the chunk boundaries land.

Just by adding one character to a file, a character that obviously lies far outside of any match, we shift the boundaries such that a match could break megabytes away from the insertion.

replies(1): >>43336635 #
daviducolo ◴[] No.43336635[source]
added https://github.com/davidesantangelo/krep/tree/main/test
replies(1): >>43337177 #
1. scottlamb ◴[] No.43337177[source]
Do those exercise the logic kazinator called out as test-worthy? To my eye, no. They don't use search_file, and their inputs are smaller than MIN_FILE_SIZE_FOR_THREADS anyway.

I'm inclined to agree with kazinator. The code here: <https://github.com/davidesantangelo/krep/blob/ac6783af42c92f...> looks wrong to me. It potentially increases `chunk_size` but doesn't reduce the number of loop iterations to be consistent with that. Maybe search_thread recognizes the boundary violation and does nothing? That'd be a relatively harmless outcome, but it's strange to launch threads that do nothing. But actually it's not immediately obvious to me that it does recognize if end_pos is beyond file_len. And then the code about handling + skipping overlaps in search_thread also looks test-worthy.

replies(1): >>43338619 #
2. kazinator ◴[] No.43338619[source]
The priority is probably to cover the search algorithms themselves first. These are the bits that are likely to be reused in other programs.