← Course Index

Load Balancing

~20 min · Foundations · System Design Primer · Gaurav Sen

Ref
Primary Source
Gaurav Sen — YouTube: "Load Balancing" + System Design Primer: Load Balancer section

Gaurav's explanation of consistent hashing and load balancing is one of the clearest available. Watch it alongside this lesson.

What Is a Load Balancer?

A load balancer sits in front of a pool of servers and distributes incoming traffic across them. It solves two problems simultaneously: scalability (spread load so no server is overwhelmed) and availability (if a server dies, route around it).

Client 1 Client 2 Client 3 Load Balancer Server A Server B Server C healthy ✓ healthy ✓ failing ✗
Load balancer distributes traffic and performs health checks — failing servers are removed from rotation

L4 vs L7 Load Balancers

L4 — Transport Layer (TCP/UDP)
Routes based on IP address and port.
Cannot see HTTP content — no path routing.
Very fast — minimal processing per packet.

Use when:
- Raw TCP throughput matters
- Non-HTTP protocols (DB connections)
- Maximum performance is critical

Examples: AWS NLB, HAProxy (TCP mode)
L7 — Application Layer (HTTP)
Routes based on URL path, headers, cookies.
Can inspect content → smarter routing.
Supports SSL termination, rate limiting.

Use when:
- Path-based routing (/api vs /static)
- Sticky sessions via cookies
- Content-based decisions

Examples: NGINX, AWS ALB, Traefik
💡 Interview tip

Default to L7 in interviews unless you have a specific reason for L4. L7 gives you much more control. Say "I'll use an L7 load balancer since we need path-based routing between our API and static file servers."

Load Balancing Algorithms

AlgorithmHow It WorksBest For
Round RobinRequests go to each server in sequence: A, B, C, A, B, C…Servers with equal capacity and stateless requests
Weighted Round RobinLike round robin, but server A might get 2× the traffic of server B based on capacityHeterogeneous server fleet
Least ConnectionsRoute to the server with the fewest active connectionsVariable-duration requests (some take 1ms, some 10s)
IP HashHash the client IP → same client always goes to same serverSticky sessions without cookies (but fragile)
Consistent HashingHash ring distributes load with minimal reshuffling when servers are added/removedDistributed caches, database routing

Health Checks

Load balancers periodically ping backend servers (usually every 5–30 seconds) to verify they're healthy. If a server fails N consecutive checks, it's removed from the rotation. When it recovers, it's re-added.

Active health check — LB probes servers directly (HTTP GET /health → 200 OK).

Passive health check — LB monitors real traffic responses; too many 5xx errors → mark unhealthy.

Sticky Sessions

Sometimes a user's subsequent requests must go to the same server (e.g., server stores local state). The LB uses a cookie to pin that user to a specific server. Avoid sticky sessions when possible — they complicate scaling and defeat the purpose of stateless servers. Use a shared cache (Redis) for session state instead.

Check Your Understanding

1. You're routing requests to /api/v1 to a Node.js cluster and /static to an S3 bucket. Which load balancer type do you need?
2. Your fleet has servers with very different processing times per request (some requests take 500ms, some 50ms). Which algorithm avoids overloading slow servers?
3. A server fails a health check three times. The load balancer removes it. 10 minutes later it passes a check. What happens?

🎓 Load balancing questions come up in every case study. Ask me how a load balancer interacts with consistent hashing (Lesson 09) or how database load balancers (like PgBouncer) differ from web load balancers.