←back to thread

Bloom Filters by Example

(llimllib.github.io)
243 points ibobev | 1 comments | | HN request time: 0.293s | source
Show context
marginalia_nu ◴[] No.44413026[source]
I have a trick I like:

For sets that are plausibly sometimes going to be small where you're going to do a lot of membership checks, you can speculatively add a 64 bit bloom filter with a trivial hash function.

This sounds really stupid, but the cost of doing this is so small you can do it as a gamble. If it doesn't work out you've added like 10ns to your insertions and membership checks, but when it does work out, you can save an incredible amount of work.

replies(1): >>44413097 #
Sesse__ ◴[] No.44413097[source]
Chromium does this in a bunch of places; the article only links to Safe Browsing using murmur, but the renderer (Blink) generally uses rapidhash and has some of these micro-filters which it uses for e.g.:

  - querySelector() in certain cases
  - Prefiltering hash lookups in CSS buckets
  - Rapid reject of elements when looking for certain Aria attributes (for accessibility)
It's surprising that such tiny filters (32 or 64 bits) work at all, but they often do. There are also some larger Bloom filters around.

(I added some of these)

replies(1): >>44413181 #
marginalia_nu ◴[] No.44413181[source]
They just have a really unintuitive economy where they basically only need to work once or twice to make up for the cost of all the times they don't contribute any benefit.
replies(1): >>44415245 #
1. Sesse__ ◴[] No.44415245[source]
For extra fun, you sometimes can make ideal filters with no false positives, if you know your possible elements ahead of time and you don't insert too many of them. (E.g., for 20 elements, you can construct a 12-bit code where there are guaranteed no false positives as long as you insert at most two elements.)