←back to thread

Bloom Filters by Example

(llimllib.github.io)
243 points ibobev | 1 comments | | HN request time: 0.207s | source
Show context
anon-3988 ◴[] No.44414230[source]
I have a specific use case where I know from startup the list of words that I want to find and this will not change for the duration of the program. Can anyone think of a low latency solution to this? I have tried a lot of variations of bloom filter, perfect hash, linear lookup, binary search, set search etc

It appears that perfect hash is the one that works the best for my use case.

replies(1): >>44414386 #
jerf ◴[] No.44414386[source]
You're saying you can use a perfect hash also implies you know you will only find those values? If so, then yes, the name is accurate and is probably a very good choice.

But if you put things into the perfect hash function it is not expecting, some fraction of them will collide.

If you're searching for a fixed set, look at the Ragel library. Compile-time generation of the search in a way that is very hard to beat.

replies(1): >>44419602 #
1. anon-3988 ◴[] No.44419602[source]
> You're saying you can use a perfect hash also implies you know you will only find those values?

No, 99.999999% of the time the values is NOT what I am looking for. So the error rate is EXTREMELY high, so its very important that the failure path is quick.