What Is Double Hashing and How Does It Work?

Double hashing is a collision resolution technique for hash tables that uses two separate hash functions to determine where to store data. When two keys land on the same slot (a collision), the second hash function calculates a step size, and the algorithm jumps forward by that many slots until it finds an empty one. This makes it one of the most effective forms of open addressing, producing more evenly spread probe sequences than simpler alternatives like linear or quadratic probing.

How the Two Hash Functions Work Together

Every hash table uses a primary hash function to convert a key into an array index. In double hashing, a second hash function kicks in only when a collision occurs. The combined formula looks like this:

position = (h1(key) + i × h2(key)) % table_size

Here, h1(key) gives the initial slot, h2(key) gives the step size, and i counts the number of attempts (0, 1, 2, 3…). On the first try, i is 0, so you just check the home position. If that slot is taken, i becomes 1, and you jump forward by h2(key) slots. If that’s taken too, i becomes 2, and you jump forward by another h2(key) slots, and so on.

A common choice for the second hash function is h2(key) = 1 + (key % (table_size - 1)). The “1 +” at the front guarantees the step size is never zero, which would trap the probe in an infinite loop on the same slot.

Why It Beats Linear and Quadratic Probing

Linear probing handles collisions by simply checking the next slot, then the next, then the next. This creates runs of occupied slots called primary clustering: once a clump forms, new keys that hash anywhere near it pile onto the end, making the clump grow faster. Lookups slow down because you have to scan through long chains of filled slots.

Quadratic probing improves on this by jumping 1, 4, 9, 16… slots instead of 1, 2, 3, 4. That breaks up primary clusters, but introduces a subtler problem called secondary clustering. Any two keys that hash to the same initial slot will follow the exact same sequence of jumps, because the step pattern depends only on the attempt number, not the key itself.

Double hashing addresses both problems. Because the step size comes from a second hash function applied to the key, two keys that collide at the same home position will almost certainly have different step sizes. They scatter in different directions instead of following the same path. This keeps the table’s slots more uniformly occupied and shortens the average probe sequence.

Why the Table Size Should Be Prime

Double hashing has one critical requirement that’s easy to overlook: the table size should be a prime number. If it isn’t, certain step sizes can cycle through only a fraction of the slots, leaving the rest permanently unreachable.

Consider a table of size 15 where a key hashes to slot 0 with a step size of 5. The probe sequence visits 0, 5, 10, 0, 5, 10, looping forever through just three slots. That’s because 5 divides evenly into 15, so the sequence wraps around without ever reaching the other 12 positions.

Now change the table size to 13 (a prime number) and use the same step size of 5. The sequence becomes 0, 5, 10, 2, 7, 12, 4, 9, 1, 6, 11, 3, visiting every single slot before repeating. A prime table size guarantees that no step size (other than zero or a multiple of the table size itself) can divide evenly into it, so the probe sequence always covers the entire table.

Performance and Load Factor

The load factor of a hash table is the ratio of filled slots to total slots. A table with 7 items in 10 slots has a load factor of 0.7. Performance degrades as this number approaches 1.0, and the relationship is not linear.

For an unsuccessful search (looking for a key that isn’t in the table), the expected number of probes is roughly 1 / (1 − load factor). At a load factor of 0.5, that’s about 2 probes. At 0.8, it’s 5. At 0.9, it jumps to 10. Insertion follows the same formula, since you’re effectively searching for the first empty slot.

Successful searches are cheaper. The expected probe count is approximately (1/α) × ln(1/(1 − α)), where α is the load factor. At 0.5 load, that works out to roughly 1.4 probes. At 0.8, about 2.0.

These formulas assume uniform hashing, which double hashing approximates well. In practice, most implementations resize the table (usually doubling to the next prime) once the load factor crosses a threshold like 0.7 or 0.75.

The Cache Trade-off

One area where double hashing can lose ground is cache performance. Modern CPUs load memory in chunks (cache lines), so accessing consecutive slots, as linear probing does, is fast because the nearby slots are likely already in the cache. Double hashing jumps by larger, less predictable intervals, which means more cache misses per probe.

In practice, this trade-off depends on the workload. Research from Greg Heileman and Wenbin Luo found that the savings from fewer collisions in double hashing usually compensate for the extra cache misses, making it the better overall choice in many scenarios. However, at low load factors where collisions are rare anyway, linear probing’s cache friendliness can give it an edge. The right choice depends on your expected table density and access patterns.

Handling Deletions

Deleting an entry from a double-hashed table is trickier than it sounds. You can’t simply empty the slot, because other keys may have been inserted after probing past that slot. If you clear it, any search for those later keys will hit the empty slot and incorrectly conclude the key doesn’t exist.

The standard solution is tombstone deletion (also called lazy deletion). Instead of truly emptying the slot, you mark it with a special “deleted” flag. During a lookup, tombstones are treated as occupied, so the search keeps probing past them. During insertion, tombstones are treated as empty, so new keys can reclaim the space. Over time, if too many tombstones accumulate, the table can be rebuilt to clear them out.

A Worked Example

Suppose you have a table of size 13 and two hash functions: h1(key) = key % 13 and h2(key) = 1 + (key % 12). You want to insert the key 27.

First, compute the home position: 27 % 13 = 1. If slot 1 is empty, you’re done. If it’s occupied, compute the step size: 1 + (27 % 12) = 1 + 3 = 4. Now probe slot (1 + 1×4) % 13 = 5. If that’s taken, try (1 + 2×4) % 13 = 9, then (1 + 3×4) % 13 = 0, and so on. Because 13 is prime, this sequence will eventually visit every slot.

Now insert key 40. Home position: 40 % 13 = 1 (same as 27). Step size: 1 + (40 % 12) = 1 + 4 = 5. The probe sequence is 1, 6, 11, 3, 8… completely different from 27’s sequence, even though both keys collided at slot 1. That’s the core advantage of double hashing in action.

Where Double Hashing Is Used

Double hashing shows up primarily in database indexing, where fast key lookups on large datasets are essential. It’s also used in some language runtime implementations and custom hash table libraries where developers need fine-grained control over collision behavior. General-purpose standard libraries (like Python’s dict or Java’s HashMap) tend to use other strategies, such as chaining or Robin Hood hashing variants, but double hashing remains a common choice in systems programming and embedded environments where memory overhead needs to stay minimal and predictable.