←back to thread

173 points daviducolo | 4 comments | | HN request time: 0.878s | source
Show context
daviducolo ◴[] No.43333994[source]
You can read my blog post about the project at https://dev.to/daviducolo/introducing-krep-building-a-high-p...
replies(4): >>43335277 #>>43335647 #>>43337748 #>>43339241 #
MattPalmer1086 ◴[] No.43337748[source]
Interesting. You may be interested in a more modern search algorithm to replace Boyer Moore. I recently presented HashChain, a very fast sublinear search algorithm at the Symposium of Experimental Algorithms.

https://drops.dagstuhl.de/storage/00lipics/lipics-vol301-sea...

It's the fastest sublinear search algorithm around in almost all cases. I also have a guaranteed worst-case linear version (which is still sublinear and much faster than Boyer Moore in the average case).

Sample code is available here:

https://github.com/nishihatapalmer/HashChain

If you're interested, let me know.

replies(1): >>43342542 #
rurban ◴[] No.43342542[source]
A benchmark of all good string-search algos is here: https://rurban.github.io/smart/results/best20/

The ones parent picked are not amongst the best at all. I added a ticket to add your HashChain algos to be tested also.

Preliminary testing shows indeed excellent numbers for hc2-hc4. Comes close to musl

replies(3): >>43343045 #>>43345885 #>>43347251 #
1. MattPalmer1086 ◴[] No.43347251[source]
The musl timings are actually incorrect, because it resets the timer on each iteration of the main loop used to count matches. So you only record the time to make the last match, not the total time. I've raised an issue on the GitHub for smart.
replies(1): >>43347920 #
2. rurban ◴[] No.43347920[source]
Oh, oh, thanks!
replies(1): >>43352002 #
3. MattPalmer1086 ◴[] No.43352002[source]
I was surprised that an algorithm from 1991 was the fastest! My first rule of benchmarking is to be suspicious of surprises!
replies(1): >>43390773 #
4. rurban ◴[] No.43390773{3}[source]
Yes, I updated the benchmarks of the best string search algos.