# Bloom Filter Calculator

Here's an interactive tool to help you tune the parameters for a Bloom filter. If you want to learn more about how Bloom filters work, see separate article here.

**Number of bits ( m)**

*m* =

In bytes:

**Number of hash functions ( k)**

*k* =

This in the optimal choice for *n* =

**Acceptable false positive rate ( p)**

*p* = %

## Probability of false positives

Assuming uniform and independent hashing, the probability of a specific bit not being set to 1 after a single use of a hash function is

*m*

After inserting *n* elements using *k* hashes each, the probability of a specific bit still being 0 is thus

⎝

*m*

⎠

*kn*

Conversely, the probability of a specific bit being set to 1, is

⎝

*m*

⎠

*kn*

Since a false positive requires *k* random bits to be set to 1, the probability of a false positive after inserting *n* elements is

⎝

⎝

*m*

⎠

*kn*

⎠

*k*

Using the formula for calculating *e*…

*e*

*x* → ∞

⎝

*x*

⎠

−*x*

…we can approximate the probability of false positives as

1 − *e*

−*kn* / *m*

*k*

## Optimal value for k

By fixing the *n* / *m* ratio in the previous equation, we can solve for *k*, and find the optimal value for it. That is, if you for example know you can spare 1 MB for the bit vector, and you know that you will insert ~10,000 elements, you can calculate the number of hash functions that **minimizes the false positives** in subsequent lookups.

Given *n* and *m* we want to minimize

1 − *e*

−*kn* / *m*

*k*

*e*

1 − *e*

−*kn* / *m*

*k*

*e*

*k*

1 − *e*

−*kn* / *m*

In other words, we want to minimize

*k*

1 − *e*

−*kn* / *m*

By simple rules of powers and logarithms, we know that k = −*m* / *n* ln(e^{−kn / m}), so we can rewrite the above as follows:

*m*

*n*

*e*

−*kn* / *m*

1 − *e*

−*kn* / *m*

By symmetry, we see that this is minimized when e^{−kn / m} = 1 − e^{−kn / m}, i.e. when e^{−kn / m} = 0.5. Intuitively this means that we should chose a *k* such that after insterting *n* elements, half of the *m* bits are set to true. Solving for *k* gives us

*k*

*m* ln 2

*n*

With this value for *k* the optimal false positive rate is approximately 0.6185^{m / n}.