Load Balancing

~20 min · Foundations · System Design Primer · Gaurav Sen

Ref

Primary Source

Gaurav Sen — YouTube: "Load Balancing" + System Design Primer: Load Balancer section

Gaurav's explanation of consistent hashing and load balancing is one of the clearest available. Watch it alongside this lesson.

What Is a Load Balancer?

A load balancer sits in front of a pool of servers and distributes incoming traffic across them. It solves two problems simultaneously: scalability (spread load so no server is overwhelmed) and availability (if a server dies, route around it).

Load balancer distributes traffic and performs health checks — failing servers are removed from rotation

L4 vs L7 Load Balancers

L4 — Transport Layer (TCP/UDP)

Routes based on IP address and port.
Cannot see HTTP content — no path routing.
Very fast — minimal processing per packet.

Use when:
- Raw TCP throughput matters
- Non-HTTP protocols (DB connections)
- Maximum performance is critical

Examples: AWS NLB, HAProxy (TCP mode)

L7 — Application Layer (HTTP)

Routes based on URL path, headers, cookies.
Can inspect content → smarter routing.
Supports SSL termination, rate limiting.

Use when:
- Path-based routing (/api vs /static)
- Sticky sessions via cookies
- Content-based decisions

Examples: NGINX, AWS ALB, Traefik

💡 Interview tip

Default to L7 in interviews unless you have a specific reason for L4. L7 gives you much more control. Say "I'll use an L7 load balancer since we need path-based routing between our API and static file servers."

Load Balancing Algorithms

Algorithm	How It Works	Best For
Round Robin	Requests go to each server in sequence: A, B, C, A, B, C…	Servers with equal capacity and stateless requests
Weighted Round Robin	Like round robin, but server A might get 2× the traffic of server B based on capacity	Heterogeneous server fleet
Least Connections	Route to the server with the fewest active connections	Variable-duration requests (some take 1ms, some 10s)
IP Hash	Hash the client IP → same client always goes to same server	Sticky sessions without cookies (but fragile)
Consistent Hashing	Hash ring distributes load with minimal reshuffling when servers are added/removed	Distributed caches, database routing

Health Checks

Load balancers periodically ping backend servers (usually every 5–30 seconds) to verify they're healthy. If a server fails N consecutive checks, it's removed from the rotation. When it recovers, it's re-added.

Active health check — LB probes servers directly (HTTP GET /health → 200 OK).

Passive health check — LB monitors real traffic responses; too many 5xx errors → mark unhealthy.

Sticky Sessions

Sometimes a user's subsequent requests must go to the same server (e.g., server stores local state). The LB uses a cookie to pin that user to a specific server. Avoid sticky sessions when possible — they complicate scaling and defeat the purpose of stateless servers. Use a shared cache (Redis) for session state instead.

Check Your Understanding

1. You're routing requests to /api/v1 to a Node.js cluster and /static to an S3 bucket. Which load balancer type do you need?

2. Your fleet has servers with very different processing times per request (some requests take 500ms, some 50ms). Which algorithm avoids overloading slow servers?

3. A server fails a health check three times. The load balancer removes it. 10 minutes later it passes a check. What happens?

🎓 Load balancing questions come up in every case study. Ask me how a load balancer interacts with consistent hashing (Lesson 09) or how database load balancers (like PgBouncer) differ from web load balancers.