Alex Xu Vol 1, Chapter 7 — "Design a Unique ID Generator in Distributed Systems"
This covers Snowflake IDs, UUIDs, Ticket Servers, and distributed synchronization techniques needed for coordination.
The Problem with Auto-Increment
In a single-instance relational database, auto-incrementing primary keys (1, 2, 3...) work perfectly. But in a distributed system with multiple databases running across datacenters, this fails:
Write bottleneck: A single central database generating IDs becomes a single point of failure (SPOF) and a throughput bottleneck.
Collisions: If multiple database shards independently auto-increment, they will produce duplicate IDs (e.g., Shard A and Shard B both generate ID 15).
Security / Privacy: Sequential IDs make it trivial for attackers to scrape your data (e.g., /users/1001, /users/1002) or guess transaction volumes.
Common ID Generation Approaches
Here are the primary ways to solve this in distributed architecture, along with their trade-offs:
Approach
Pros
Cons
UUID (v4) 128-bit random value
✓ Generated locally (no network delay) ✓ Scale-free (no coordination needed)
✗ Large (128-bit) size hurts index performance ✗ Not sorting/chronological (bad for DB clustering)
Multi-Master Replication DBs increment by K (number of masters)
✓ Easy to build on top of existing SQL setups
✗ Hard to scale out (adding/removing masters requires database re-configuration)
Ticket Server Centralized SQL DB dedicated to increments (Flickr)
✓ Simple, generates sequential 64-bit IDs
✗ High latency network call for every write ✗ Ticket server is a SPOF (requires multi-active standby)
Twitter Snowflake Time-sortable 64-bit IDs
✓ Ordered by time (mostly) ✓ 64-bit size is index-friendly ✓ Fully distributed generation
✗ Requires system clock synchronization (NTP) ✗ Slightly more complex to configure
Twitter Snowflake ID Anatomy
Snowflake IDs partition a 64-bit integer into distinct fields. By placing the timestamp bits first, the IDs naturally sort chronologically.
Layout of a 64-bit Twitter Snowflake ID
Sign bit (1 bit): Reserved for future use (keeps the integer positive).
Timestamp (41 bits): Milliseconds elapsed since a custom epoch (e.g. your company's launch date). 41 bits gives roughly ~69 years of support.
Datacenter ID (5 bits): Allows up to 32 datacenters (25).
Worker ID (5 bits): Allows up to 32 worker machines per datacenter.
Sequence number (12 bits): Increments for every ID generated on that machine within the same millisecond. Resets to 0 every millisecond. Supports 4,096 IDs per millisecond per machine.
💡 Clock Drift Warning
If a machine's clock runs backward (clock drift / NTP adjustments), a Snowflake generator could produce duplicate IDs. Real-world implementations check if the current timestamp is less than the last timestamp, and if so, wait for the clock to catch up or raise an error.
Distributed Coordination (ZooKeeper)
When you run thousands of microservices, they need coordination to avoid conflict. ZooKeeper is the gold standard for this, acting as a hierarchical, highly consistent key-value namespace.
Key ZooKeeper Use Cases:
Leader Election: Automatically selecting a master server in a cluster. If the master dies, ZooKeeper detects it via heartbeat timeout (ephemeral nodes) and coordinate election of a new leader.
Distributed Locks: Mutex locks spanning across multiple servers to prevent parallel workers from editing the same resource.
Service Registry / Configuration Management: Keeping track of which microservices are healthy and what their configurations are.
💡 Interview default
Avoid building your own consensus or coordination logic. If asked how nodes agree on a leader or detect node failures in a cluster, say: "I'll use ZooKeeper to manage service states, ephemeral node heartbeats, and leader election."
Check Your Understanding
1. Why is Twitter Snowflake preferred over UUID v4 for database primary keys in high-throughput systems?
2. How does a Snowflake ID generator handle generating multiple IDs in the exact same millisecond?
3. What ZooKeeper feature is primarily used to detect that a cluster node has crashed?