← Course Index

Design a Key-Value Store

~25 min · Case Studies · Alex Xu Vol 1, Ch 6 · DDIA §5–6

Ref
Primary Source
Alex Xu Vol 1, Chapter 6 — "Design a Key-Value Store" & DDIA §5–6

Explains decentralized masterless database design (like DynamoDB or Cassandra) focusing on consistency, replication, and node coordination.

What is a Masterless KV Store?

In standard replication systems (like MySQL replication), a single leader node processes writes, which replicate to follower nodes. In a **masterless** key-value store (modeled after Amazon's Dynamo paper), any node can accept read or write requests. This structure maximizes write availability and scalability at the cost of consistency model complexity.

Key Design Building Blocks

A distributed key-value store is built on a stack of specific patterns:

Quorum Consensus (R + W > N)

In masterless stores, we tune consistency using three numbers:

Coordinator Node Replica Node A (OK) Replica Node B (OK) Replica Node C (Slow) Write / Read Write / Read
Quorum with N=3. If W=2, we only wait for A and B to respond before returning success to the client.
Quorum ConfigurationConsistency LevelUse Case
R + W > N
e.g. N=3, W=2, R=2
Strong Consistency Default setting. Guaranteed that the read quorum overlaps with the write quorum, so reads always return the latest value.
W = 1, R = 1
N=3
Eventual Consistency Highly optimized for write speed and read speed. Replicas synchronize in the background. Risk of reading stale values.
W = N, R = 1 Strong Consistency (Read optimized) Fast reads, but writes fail if even a single replica node goes offline.

Conflict Resolution: Vector Clocks

When multiple nodes accept concurrent writes for the same key, conflict resolution is needed. In databases like Cassandra, **Last-Write-Wins (LWW)** is used, which resolves conflicts by checking timestamps (susceptible to NTP clock drift).
Dynamo uses **Vector Clocks**: a list of [server, counter] pairs attached to every version of a data object.

Vector Clock Example
Client writes V1. Server Sx processes it:
State: V1, Clock: [Sx, 1]

Client edits V1 to V2. Server Sy processes:
State: V2, Clock: [Sx, 1], [Sy, 1]

If two clients read V2 and make concurrent, 
conflicting edits:
Client A writes to Sx → [Sx, 2], [Sy, 1]
Client B writes to Sz → [Sx, 1], [Sy, 1], [Sz, 1]
Resolving Conflicts (Siblings)
When the next reader fetches the key:
Server detects the clocks are conflicting 
(neither is a direct ancestor of the other).

Server returns both versions (siblings) 
to the client application.
The client app must merge the values 
(e.g., merging shopping cart items) and 
write the merged version back to the store.

Check Your Understanding

1. In a masterless key-value store with replication factor N=3, write quorum W=2, and read quorum R=2, what consistency guarantee does the client receive?
2. What is the role of Merkle Trees in a distributed database like Dynamo or Cassandra?
3. How does a decentralized system use Vector Clocks?