←back to thread

Bloom Filters by Example

(llimllib.github.io)
243 points ibobev | 1 comments | | HN request time: 0.259s | source
Show context
verytrivial ◴[] No.44413866[source]
The overlap in concepts required to understand Bloom filters, sets and hash tables is about 95% IMHO. A set is a hash table used for membership tests where you only care about the key, not the value. And a Bloom filter is just a set that exploits the fact that many-to-one hashing 'compresses' the key-space with collisions. It deliberately uses a very collide-y hash function. If a specific key was ever hashed, you WILL get a hit, but there might be other keys that produced the same hash. It's a feature, not a bug.
replies(3): >>44414062 #>>44414176 #>>44416360 #
1. cortesoft ◴[] No.44416360[source]
I think the main bit your explanation is missing is how a bloom filter uses multiple hash functions to reduce collisions. For example, a bloom filter might have 3 hashes, and all of them have to hit for a key to be known to be in the set. This reduces the likelihood of false positive collisions while keeping the no false negatives guarantee.