I find it interesting that, with ~63k entries, they didn't just use a bloom filter to do a lookup on if the entity they're working with has already been seen. granted, they stoll need to store the data, but I think a bloom filter would be a more effective way to test if the item exists already