← Course Index

Consistent Hashing

~25 min · Key Pattern · Alex Xu Vol 1, Ch 5

Ref
Primary Source
Alex Xu — System Design Interview Vol 1, Chapter 5

Alex Xu's chapter is the clearest introduction to consistent hashing. Also watch Gaurav Sen's YouTube video "Consistent Hashing" for an animated walkthrough.

The Problem with Naive Hashing

With naive modulo hashing (server = hash(key) % N), adding or removing a server changes N — which remaps almost every key to a different server. If you have 1 million cached items and add one server to a pool of 4, about 75% of items get remapped. Your cache hit rate collapses.

Naive hashing problem
hash("user:123") % 4 = 2 → Server 2
hash("user:456") % 4 = 1 → Server 1

Add a 5th server (N changes to 5):
hash("user:123") % 5 = 3 → Server 3 ← MOVED!
hash("user:456") % 5 = 1 → Server 1 ← ok

~80% of keys remapped → cache miss storm

The Hash Ring

Consistent hashing maps both servers and keys onto the same hash space — imagined as a ring from 0 to 2³² (or some large integer). A key is assigned to the first server encountered moving clockwise around the ring.

0 2³²/4 2³²/2 3×2³²/4 A B C D key1→B key2→C key3→D key4→A Hash ring (0 → 2³²) Keys go clockwise to the next server on the ring
The consistent hash ring — servers and keys share the same hash space
Why this is better

When you add server E to the ring, only the keys between server D and E need to remapped — to server E. All other keys stay put. On average, only K/N keys get remapped (K = total keys, N = number of servers).

Virtual Nodes (vnodes)

One problem with basic consistent hashing: servers don't land evenly on the ring. One server might "own" 60% of the ring, another only 10%. Solution: give each physical server multiple positions on the ring — called virtual nodes.

Without virtual nodes
Server A: owns arc 0–25%
Server B: owns arc 25–80%
Server C: owns arc 80–100%

B handles 55% of traffic!
Uneven distribution.
With virtual nodes (100 vnodes each)
Server A: 100 positions on ring
Server B: 100 positions on ring
Server C: 100 positions on ring

Each server ends up owning
~33% of the ring on average.
Much more even.

With more virtual nodes, distribution becomes more uniform. The trade-off: more memory to store the vnode-to-server mapping. Typical production values: 100–1000 vnodes per server.

Where Consistent Hashing Is Used

SystemWhat gets hashedWhy it matters
Redis ClusterCache keys → shardsAdding a cache node only migrates 1/N of keys
CassandraPartition keys → nodesAdding nodes redistributes data minimally
Load balancersClient IP → backend serverSticky sessions without server-side state
CDN (Akamai)Content URL → edge serverContent stays on the same edge; cache warm
Amazon DynamoDBPartition key → storage nodeAutomatic scaling with minimal data movement

Check Your Understanding

1. You have 4 cache servers and 1 million cached items. You add a 5th server using naive hash (% N). Approximately how many items need to be remapped?
2. With consistent hashing (no virtual nodes), you add 1 server to a ring of 4. Approximately what fraction of keys need to be remapped?
3. Why are virtual nodes used in consistent hashing?

🎓 Consistent hashing appears in multiple case studies. Ask me how it's used in the Key-Value Store design (Lesson 20), or how Cassandra uses it with replication factor to ensure fault tolerance.