←back to thread

Bloom Filters by Example

(llimllib.github.io)

243 points ibobev | 1 comments | 29 Jun 25 11:56 UTC | HN request time: 0.259s | source

Show context

verytrivial ◴[29 Jun 25 15:26 UTC] No.44413866[source]▶

>>44412370 (OP) #

The overlap in concepts required to understand Bloom filters, sets and hash tables is about 95% IMHO. A set is a hash table used for membership tests where you only care about the key, not the value. And a Bloom filter is just a set that exploits the fact that many-to-one hashing 'compresses' the key-space with collisions. It deliberately uses a very collide-y hash function. If a specific key was ever hashed, you WILL get a hit, but there might be other keys that produced the same hash. It's a feature, not a bug.

replies(3): >>44414062 #>>44414176 #>>44416360 #

1. cortesoft ◴[29 Jun 25 20:57 UTC] No.44416360[source]▶

I think the main bit your explanation is missing is how a bloom filter uses multiple hash functions to reduce collisions. For example, a bloom filter might have 3 hashes, and all of them have to hit for a key to be known to be in the set. This reduces the likelihood of false positive collisions while keeping the no false negatives guarantee.