← Course Index

Design a URL Shortener

~25 min · Case Studies · Alex Xu Vol 1, Ch 8

Ref
Primary Source
Alex Xu Vol 1, Chapter 8 — "Design a URL Shortener"

The classic system design exercise covering redirection mechanics, hash collisions, base conversion, and read-heavy caching.

Problem Requirements & Estimation

A URL shortener (like bit.ly) maps a long URL (e.g., a 200-character link) into a short alias (e.g., https://tiny.com/aBc123xyz). Let's establish basic parameters for the interview:

Redirection Mechanics: 301 vs 302

When a user visits a short URL, the server responds with a redirect status code. Which one you choose is a vital design trade-off:

301 Moved Permanently
Client's browser caches the redirect mapping. 
Subsequent requests bypass the shortener server 
completely, loading the long URL from cache.

✓ Reduces traffic load on your servers.
✗ Prevents real-time analytics collection 
  (clicks, location) after the first request.
302 Found (Temporary Redirect)
Browser always sends the request to the 
shortener server first. The server then returns 
the 302 with the Location header.

✓ Enables accurate click tracking & analytics.
✗ Increases latency and request volume on your 
  servers.

Two Hashing Strategies

Option 1: Hashing with Collision Resolution

Take the long URL, compute a cryptographic hash (like MD5 or SHA-1), and pick the first 7 characters.
The collision issue: Different long URLs can yield the same 7-character hash.
Resolution: If a collision is detected in the database, append a predefined unique salt string to the long URL and re-hash. This requires checking the database on every write, adding write latency.

Option 2: Base62 Encoding with Unique ID generator

Use a distributed ID generator (e.g. Snowflake IDs or Flickr ticket server) to generate a unique 64-bit integer ID for each new URL. Then, convert that base-10 number to a base-62 string.
Base62 uses: [0-9], [a-z], [A-Z] (10 + 26 + 26 = 62 possibilities).
A 7-character Base62 string allows 627 ≈ 3.5 trillion unique URLs, which easily fits our 10-year storage requirements.

Base Conversion Example
ID = 200921567493 (Base 10)
Base 62 division loop → "zn8x9A" (Base 62 String)

System Architecture

Client Load Balancer Shortener APIs Redis (Hash → URL) SQL/NoSQL Store 1. Check cache 2. Cache miss -> DB lookup
URL Shortener Architecture: Caching is critical for high-volume 302 redirects

Optimizing Redirections with Caching:

Since this is a read-heavy system (reads/writes = 10:1), we cache the mapping of short_hash → long_url in Redis.
Write path: When a new short link is created, save it to the database, and write to Redis optionally (or let it load on the first read).
Read path: The API first checks Redis. If it's a hit, return 302 immediately. On a cache miss, query the DB, store it in Redis, and return.

Check Your Understanding

1. In a URL shortener, what is the primary benefit of using a 302 Found redirect instead of a 301 Moved Permanently redirect?
2. How many distinct characters does Base62 conversion use?
3. When using "Base62 Encoding with Unique ID generator", how are hash collisions prevented?