Bloom Filter Calculator

Here's an interactive tool to help you tune the parameters for a Bloom filter. If you want to learn more about how Bloom filters work, see separate article here.

Number of bits (m)

m = 

In bytes:

Number of hash functions (k)

k = 

This in the optimal choice for n = 

Not the optimal choice for any number of elements.

Acceptable false positive rate (p)

p =  %

Allows for up to elements.

Allows no elements to be added.

Probability of false positives

Assuming uniform and independent hashing, the probability of a specific bit not being set to 1 after a single use of a hash function is

−

After inserting n elements using k hashes each, the probability of a specific bit still being 0 is thus

⎛
⎝

−

⎞
⎠

Conversely, the probability of a specific bit being set to 1, is

−

⎛
⎝

−

⎞
⎠

Since a false positive requires k random bits to be set to 1, the probability of a false positive after inserting n elements is

⎛
⎝

−

⎛
⎝

−

⎞
⎠

Using the formula for calculating e…

lim

x → ∞

⎛
⎝

−

⎞
⎠

−x

…we can approximate the probability of false positives as

(

1 − e

−kn / m

)

Optimal value for k

By fixing the n / m ratio in the previous equation, we can solve for k, and find the optimal value for it. That is, if you for example know you can spare 1 MB for the bit vector, and you know that you will insert ~10,000 elements, you can calculate the number of hash functions that minimizes the false positives in subsequent lookups.

Given n and m we want to minimize

(

1 − e

−kn / m

)

(

1 − e

−kn / m

)

(

1 − e

−kn / m

)

In other words, we want to minimize

(

1 − e

−kn / m

)

By simple rules of powers and logarithms, we know that k = −m / n ln(e^{−kn / m}), so we can rewrite the above as follows:

− 

(

−kn / m

)

(

1 − e

−kn / m

)

By symmetry, we see that this is minimized when e^{−kn / m} = 1 − e^{−kn / m}, i.e. when e^{−kn / m} = 0.5. Intuitively this means that we should chose a k such that after insterting n elements, half of the m bits are set to true. Solving for k gives us

m ln 2

With this value for k the optimal false positive rate is approximately 0.6185^m / n.

Comments

Be the first to comment!