←back to thread

Bloom Filters by Example

(llimllib.github.io)
243 points ibobev | 4 comments | | HN request time: 0.423s | source
1. verytrivial ◴[] No.44413866[source]
The overlap in concepts required to understand Bloom filters, sets and hash tables is about 95% IMHO. A set is a hash table used for membership tests where you only care about the key, not the value. And a Bloom filter is just a set that exploits the fact that many-to-one hashing 'compresses' the key-space with collisions. It deliberately uses a very collide-y hash function. If a specific key was ever hashed, you WILL get a hit, but there might be other keys that produced the same hash. It's a feature, not a bug.
replies(3): >>44414062 #>>44414176 #>>44416360 #
2. marginalia_nu ◴[] No.44414062[source]
If you've grokked bloom filters, you're very close to also understanding both random projection and certain implementations of locality-sensitive hashes.
3. cherrycherry98 ◴[] No.44414176[source]
Glad to know I'm not alone in my mental modeling of Bloom filters as just hash tables that only track the buckets which have data but not the actual data itself.
4. cortesoft ◴[] No.44416360[source]
I think the main bit your explanation is missing is how a bloom filter uses multiple hash functions to reduce collisions. For example, a bloom filter might have 3 hashes, and all of them have to hit for a key to be known to be in the set. This reduces the likelihood of false positive collisions while keeping the no false negatives guarantee.