Skip to content
This repository has been archived by the owner on Jan 23, 2025. It is now read-only.

Force counts_per_func to nearest prime > counts_per_func ? #66

Open
pik opened this issue Mar 18, 2015 · 0 comments
Open

Force counts_per_func to nearest prime > counts_per_func ? #66

pik opened this issue Mar 18, 2015 · 0 comments

Comments

@pik
Copy link

pik commented Mar 18, 2015

In the paper referenced for the bloom filter hashing algorithm used (https://github.com/bitly/dablooms/blob/master/src/dablooms.c#L185) (Kirsch, Mitzenmacher [2006]) - the recommended algorithm is given by:

gi(x)=h1(x)+ih2(x)(mod p)

From my reading it seems that their theoretical estimate for collision probabilities will only hold provided that counts_per_func is prime. It is also notable that 64 bits of entropy from MurmurHash3_x64_128 is discarded without being used. I'm not sure how significant the difference is on average, but it may make sense to either:
a) force counts_per_func to the nearest prime greater than counts_per_func
or
b) use a triple hashing algorithm taking advantage of the unused entropy provided by murmur hash

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

1 participant