← Course Index

Scaling from Zero to Millions of Users

~25 min · Foundations · Alex Xu Vol 1, Ch 1

Ref
Primary Source
Alex Xu — System Design Interview Vol 1, Chapter 1

The big-picture journey every scalable system takes. This is the mental model that frames every design decision in this course.

The Journey

Every massive system started small. Twitter, Instagram, Uber — all began as a single server. Understanding the evolutionary path from Day 1 to 100 million users gives you a mental map for when to add which components — and why.

STAGE 1 — SINGLE SERVER (DAY 1) Client Single Server Web + DB + App STAGE 2 — SEPARATE WEB & DATABASE TIERS Client Web Server Database SQL / NoSQL STAGE 3 — LOAD BALANCER + MULTIPLE WEB SERVERS Clients Load Balancer Web Server 1 Web Server 2 Primary DB (writes) Replica DB (reads) Cache (Redis) STAGE 4 — CDN + SHARDING + MESSAGE QUEUES CDN LB + Servers Msg Queue Shard 1 DB Users A-M Shard 2 DB Users N-Z Search Index Elasticsearch Each component gets added when the previous stage's bottleneck is hit
The 4-stage evolution — every system follows roughly this path

Stage by Stage

Stage 1 — Single Server

Everything runs on one machine: web server, application logic, and database. This is fine for thousands of users. You hit the CPU and memory ceiling first.

Bottleneck: One machine does everything. It will fall over under load, and a single crash takes down the whole system (SPOF).

Stage 2 — Separate Web and Database Tiers

Move the database to its own server. Now you can scale each independently. You can vertically scale the DB (more RAM for indexes) separately from the web tier.

Bottleneck: Still one web server (SPOF), and the DB is still a SPOF.

Stage 3 — Load Balancer + Horizontal Web Scaling + DB Replication

Add a load balancer in front of multiple web servers. Now you can add web servers to handle more traffic. Separate read replicas from the primary DB — reads (which are 80–90% of most traffic) go to replicas, writes go to primary.

Why this order?

Web tier is stateless and easy to scale horizontally. Database scaling is hard — so you squeeze as much as you can from the web tier first, then tackle the DB.

Add a cache (Redis) between the web servers and database. Frequently read data doesn't hit the DB at all — cuts read load by 80%+ in most apps.

Stage 4 — CDN, Sharding, Message Queues

At massive scale, you shard your database horizontally (split data across multiple DB servers). Add a CDN to serve static assets from edge nodes close to users. Add message queues to decouple slow async work from the request path.

At this stage you also separate your monolith into services (not always microservices — "services" can be coarse-grained).

Stateless vs Stateful Web Tier

To scale the web tier horizontally, servers must be stateless — they must not store any user session data locally. Store sessions in a shared cache (Redis) instead. This way any request can go to any server.

❌ Stateful (Don't do this)
Server 1: stores User A's session
Server 2: stores User B's session

Problem: User A's next request MUST
go to Server 1 — sticky sessions.
Can't freely load balance.
✅ Stateless (Do this)
Server 1: stateless, reads session from Redis
Server 2: stateless, reads session from Redis

User A's request can go to any server.
Add servers freely. Remove servers freely.

Check Your Understanding

1. You have one web server and one DB server. The web server is at 95% CPU during peak. What should you add first?
2. Why must web servers be stateless to scale horizontally?
3. What problem does a cache (Redis) primarily solve in Stage 3?
4. In DB replication (primary + replica), where should writes go?

🎓 This evolution is the backbone of every case study. When you design YouTube or Twitter, you'll follow exactly this path. Ask me to walk through how a specific company scaled if you want concrete examples.