←back to thread

173 points daviducolo | 8 comments | | HN request time: 1.394s | source | bottom
1. MattPalmer1086 ◴[] No.43337748[source]
Interesting. You may be interested in a more modern search algorithm to replace Boyer Moore. I recently presented HashChain, a very fast sublinear search algorithm at the Symposium of Experimental Algorithms.

https://drops.dagstuhl.de/storage/00lipics/lipics-vol301-sea...

It's the fastest sublinear search algorithm around in almost all cases. I also have a guaranteed worst-case linear version (which is still sublinear and much faster than Boyer Moore in the average case).

Sample code is available here:

https://github.com/nishihatapalmer/HashChain

If you're interested, let me know.

replies(1): >>43342542 #
2. rurban ◴[] No.43342542[source]
A benchmark of all good string-search algos is here: https://rurban.github.io/smart/results/best20/

The ones parent picked are not amongst the best at all. I added a ticket to add your HashChain algos to be tested also.

Preliminary testing shows indeed excellent numbers for hc2-hc4. Comes close to musl

replies(3): >>43343045 #>>43345885 #>>43347251 #
3. MattPalmer1086 ◴[] No.43343045[source]
Amazing, thanks. The sample code is already written to integrate with smart (I have also contributed to the smart tool!).

You may want to add the linear version of HashChain too (in the same GitHub repo).

4. MattPalmer1086 ◴[] No.43345885[source]
Looks like musl is using pointer arithmetic?

Can speed up most of the algorithms in smart by using that. It's faster than using array indexing generally. I didn't do that in the sample code for HashChain simply because it would be an unfair comparison with the others.

5. MattPalmer1086 ◴[] No.43347251[source]
The musl timings are actually incorrect, because it resets the timer on each iteration of the main loop used to count matches. So you only record the time to make the last match, not the total time. I've raised an issue on the GitHub for smart.
replies(1): >>43347920 #
6. rurban ◴[] No.43347920{3}[source]
Oh, oh, thanks!
replies(1): >>43352002 #
7. MattPalmer1086 ◴[] No.43352002{4}[source]
I was surprised that an algorithm from 1991 was the fastest! My first rule of benchmarking is to be suspicious of surprises!
replies(1): >>43390773 #
8. rurban ◴[] No.43390773{5}[source]
Yes, I updated the benchmarks of the best string search algos.