Bloom Filter Calculator

Here's an interactive tool to help you tune the parameters for a Bloom filter. If you want to learn more about how Bloom filters work, see separate article here.


Number of bits (m)

m = 

In bytes:

Number of hash functions (k)

k = 

This in the optimal choice for n = 

Not the optimal choice for any number of elements.

Acceptable false positive rate (p)

p =  %

Allows for up to elements.
Allows no elements to be added.
1 0 p n 1 2 3 4 5 6 7 8 9

Probability of false positives

Assuming uniform and independent hashing, the probability of a specific bit not being set to 1 after a single use of a hash function is

1
1

m

After inserting n elements using k hashes each, the probability of a specific bit still being 0 is thus


1
1

m


kn

Conversely, the probability of a specific bit being set to 1, is

1

1
1

m


kn

Since a false positive requires k random bits to be set to 1, the probability of a false positive after inserting n elements is


1

1
1

m


kn


k

Using the formula for calculating e

e

=
 
lim

x → ∞


1
1

x


x

…we can approximate the probability of false positives as

(

1 − e

kn / m

)

k

Optimal value for k

By fixing the n / m ratio in the previous equation, we can solve for k, and find the optimal value for it. That is, if you for example know you can spare 1 MB for the bit vector, and you know that you will insert ~10,000 elements, you can calculate the number of hash functions that minimizes the false positives in subsequent lookups.

Given n and m we want to minimize

= 
(

1 − e

kn / m

)

k

= 

e

ln 
(

1 − e

kn / m

)

k

= 

e

k

ln 
(

1 − e

kn / m

)

In other words, we want to minimize

k

ln 
(

1 − e

kn / m

)

By simple rules of powers and logarithms, we know that k = −m / n ln(ekn / m), so we can rewrite the above as follows:

− 

m

n

 ln 
(

e

kn / m

)
 ln 
(

1 − e

kn / m

)

By symmetry, we see that this is minimized when ekn / m = 1 − ekn / m, i.e. when ekn / m = 0.5. Intuitively this means that we should chose a k such that after insterting n elements, half of the m bits are set to true. Solving for k gives us

k

=

m ln 2

n

With this value for k the optimal false positive rate is approximately 0.6185m / n.

Comments

Be the first to comment!