Why large prime numbers are used in hash tables

Hash tables store objects in buckets. Which bucket an object ends up in depends on the hash value for the object.

To transform a hash value to a valid bucket index you typically just compute modulo numBuckets where numBuckets represents the number of buckets in the hash table.

Hash tables perform best when objects are spread evenly among the buckets. A hash function should therefore strive for this goal.

Large hash values

A hash function that only returns small values, say between 0 and 10, does not achieve a good spread if numBuckets becomes greater than 10.

To maintain a good spread when there's a large number of buckets, hash functions typically scale up the values by multiplying with a large prime number.

Why prime numbers?

If we scaled up the values by a composite number (non-prime), say 1000, there would be many values for numBuckets where the scale would be "cancelled out" in the mod operation. 1000 % numBuckets = 0 for numBuckets = 2, 4, 5, 8, 10, … In other words, similar objects would get similar bucket indexes. This reduces the spread and makes for a poor hash function.

By choosing a large prime number, we avoid this problem, since a prime number is not divisible by numBuckets unless numBuckets happens to be a multiple of that particular prime number.

Comments

Be the first to comment!