← Course Index

Back-of-the-Envelope Estimation

~25 min · Foundations · Cheat Sheet →

Ref
Primary Source
Alex Xu — System Design Interview Vol 1, Chapter 2

This chapter covers estimation with worked examples. Supplement with Jeff Dean's "Latency Numbers Every Programmer Should Know."

Why Estimation Matters

Estimation signals that you understand scale. A system with 1,000 users is architecturally different from one with 100 million. Knowing the order of magnitude tells you which components to reach for, where the bottlenecks will be, and how many servers you need.

In an interview, estimation is Step 2 — it takes about 5 minutes and sets up every design decision that follows.

Interview mindset

Don't worry about getting exact numbers. Interviewers care that you can reason about scale. Round aggressively: "100 million DAU, let's say ~1M requests per day per user action, so roughly 1,200 QPS — call it 1K QPS."

The Numbers You Must Know

Power of Two — Data Units

Memory & Storage Units
1 KB = 2¹⁰ bytes ≈ 1 Thousand bytes
1 MB = 2²⁰ bytes ≈ 1 Million bytes
1 GB = 2³⁰ bytes ≈ 1 Billion bytes
1 TB = 2⁴⁰ bytes ≈ 1 Trillion bytes
1 PB = 2⁵⁰ bytes ≈ 1 Quadrillion bytes

Latency — Orders of Magnitude

These are the most important numbers in system design. Knowing them lets you reason about where time is being spent.

OPERATION VISUAL SCALE LATENCY L1 cache 1 ns L2 cache 4 ns Main memory (RAM) 100 ns SSD random read 16 μs Datacenter round trip 500 μs HDD sequential 1MB read 825 μs Cross-continent packet (CA ↔ NL) 150 ms Key insight: Memory is ~1000× faster than SSD. SSD is ~1000× faster than a cross-continent network call. This is why caching works.
Latency numbers — notice the orders of magnitude between tiers

Availability — The Nines

Availability → Downtime Per Year
99% (two nines) → 3.65 days
99.9% (three nines) → 8.76 hours
99.99% (four nines) → 52.6 minutes ← typical SLA target
99.999% (five nines) → 5.26 minutes

The Core Formulas

QPS (Queries Per Second)

Formula
QPS = DAU × avg_actions_per_user ÷ 86,400

Example: Twitter
300M DAU × 10 reads per user per day ÷ 86,400
= 3B reads ÷ 86,400
34,700 QPS reads

Peak QPS ≈ avg × 2 to 5 (traffic is bursty)

Storage

Formula
Storage = daily_new_records × record_size × retention_years × 365

Example: Tweet storage for 5 years
100M tweets/day × 300 bytes × 365 × 5
= 100M × 300B × 1,825
≈ 54.75 TB → ~55 TB

Bandwidth

Formula
Bandwidth = QPS × avg_response_size

Example: Photo serving at 12K QPS
12,000 QPS × 200 KB per photo
= 2,400,000 KB/s
~2.3 GB/s outbound

Common Reference Sizes

WhatTypical Size
A tweet (text)~300 bytes
A DB metadata row~1 KB
A web page (HTML)~100 KB
A compressed photo~200 KB
A high-res photo~2 MB
A 1-minute video (compressed)~10 MB
Seconds in a day86,400

Practice Estimation

1. A social network has 100M DAU. Each user posts 2 items per day and reads 20. What is the approximate read QPS?
2. Which operation is roughly 1,000× slower than reading from RAM?
3. A service stores 50M new records per day, each 500 bytes, for 3 years. Approximate storage needed?
4. A system needs 99.99% availability. How much downtime per year is that?

🎓 Estimation feels unnatural at first. If you want to practice more worked examples, or want me to walk through estimation for a specific system (e.g., YouTube, WhatsApp), just ask.