Why large prime numbers are used in hash tables
Hash tables store objects in buckets. Which bucket an object ends up in depends on the hash value for the object.
To transform a hash value to a valid bucket index you typically just compute modulo numBuckets
where numBuckets
represents the number of buckets in the hash table.
Hash tables perform best when objects are spread evenly among the buckets. A hash function should therefore strive for this goal.
Large hash values
A hash function that only returns small values, say between 0 and 10, does not achieve a good spread if numBuckets
becomes greater than 10.
To maintain a good spread when there's a large number of buckets, hash functions typically scale up the values by multiplying with a large prime number.
Why prime numbers?
If we scaled up the values by a composite number (non-prime), say 1000, there would be many values for numBuckets
where the scale would be "cancelled out" in the mod operation. 1000 % numBuckets
= 0 for numBuckets
= 2, 4, 5, 8, 10, … In other words, similar objects would get similar bucket indexes. This reduces the spread and makes for a poor hash function.
By choosing a large prime number, we avoid this problem, since a prime number is not divisible by numBuckets
unless numBuckets
happens to be a multiple of that particular prime number.