Bloom Filter Calculator
Here's an interactive tool to help you tune the parameters for a Bloom filter. If you want to learn more about how Bloom filters work, see separate article here.
Number of bits (m)
m =
In bytes:
Number of hash functions (k)
k =
This in the optimal choice for n =
Acceptable false positive rate (p)
p = %
Probability of false positives
Assuming uniform and independent hashing, the probability of a specific bit not being set to 1 after a single use of a hash function is
m
After inserting n elements using k hashes each, the probability of a specific bit still being 0 is thus
⎝
m
⎠
kn
Conversely, the probability of a specific bit being set to 1, is
⎝
m
⎠
kn
Since a false positive requires k random bits to be set to 1, the probability of a false positive after inserting n elements is
⎝
⎝
m
⎠
kn
⎠
k
Using the formula for calculating e…
e
x → ∞
⎝
x
⎠
−x
…we can approximate the probability of false positives as
1 − e
−kn / m
k
Optimal value for k
By fixing the n / m ratio in the previous equation, we can solve for k, and find the optimal value for it. That is, if you for example know you can spare 1 MB for the bit vector, and you know that you will insert ~10,000 elements, you can calculate the number of hash functions that minimizes the false positives in subsequent lookups.
Given n and m we want to minimize
1 − e
−kn / m
k
e
1 − e
−kn / m
k
e
k
1 − e
−kn / m
In other words, we want to minimize
k
1 − e
−kn / m
By simple rules of powers and logarithms, we know that k = −m / n ln(e−kn / m), so we can rewrite the above as follows:
m
n
e
−kn / m
1 − e
−kn / m
By symmetry, we see that this is minimized when e−kn / m = 1 − e−kn / m, i.e. when e−kn / m = 0.5. Intuitively this means that we should chose a k such that after insterting n elements, half of the m bits are set to true. Solving for k gives us
k
m ln 2
n
With this value for k the optimal false positive rate is approximately 0.6185m / n.