← Course Index

Rate Limiting

~25 min · Advanced Patterns · Alex Xu Vol 1, Ch 4

Ref
Primary Source
Alex Xu Vol 1, Chapter 4 — "Design a Rate Limiter"

Explores the mathematical algorithms behind limiters and how to build a high-performance distributed rate limiter using Redis.

What is Rate Limiting?

A rate limiter controls the rate of traffic sent by a client or service. If a client exceeds the limit, the server rejects their request with an HTTP status code 429 Too Many Requests. This protects backends from:

Core Rate Limiting Algorithms

Different use cases call for different algorithms. Here are the 5 classic algorithms:

AlgorithmHow it WorksPros / Cons
Token Bucket
Used by Amazon, Stripe
A bucket holds tokens. Tokens are added at a constant rate. Each request takes a token. If empty, request is dropped. ✓ Handles short bursts of traffic
✓ Simple to implement in Redis
Leaky Bucket
Used by Shopify
Requests enter a queue. The queue drains (processes requests) at a constant rate. If queue fills, requests drop. ✓ Guarantees smooth, stable output rate
✗ Bursts are delayed rather than instant
Fixed Window Counter Divides time into windows (e.g., 1 min). Tracks request counts per window. Resets count at start of next window. ✗ Traffic spike at boundaries can double the allowed limit (e.g., 200 requests in 2 seconds)
Sliding Window Log Keeps timestamps of all client requests in a sorted set. Removes expired timestamps. Count is size of set. ✗ Memory-intensive: stores every single timestamp even if request is blocked
Sliding Window Counter Combines fixed window and sliding window. Estimates rate by computing weight of current and previous window counts. ✓ Low memory footprint, smooths out boundary spikes

Distributed Rate Limiter Architecture

In a distributed system, a client's requests hit different API servers. Therefore, counters must be stored in a centralized, ultra-low-latency cache like Redis rather than local server memory.

Client API Gateway (Rate Limit Filter) Redis Cache IP/User ID → Token Count Checked via Lua Script Request Check Limit
Distributed Rate Limiter: API Gateway checks Redis tokens before passing traffic to internal microservices

Two Key Distributed Challenges:

  1. Race Conditions: If two requests check Redis at the same time, both might see 1 token remaining, accept the request, and decrement the counter, allowing a duplicate request through.
    Solution: Use **Lua Scripts** or Redis sorted sets with transactions. Lua scripts run atomically in Redis, preventing race conditions.
  2. Synchronization Latency: If multiple gateways run across different regions, querying a central Redis adds latency.
    Solution: Use local in-memory buffers in the API gateways that sync asynchronously with Redis in batches, or deploy regional Redis replica systems.

Rate Limiter Headers

Good APIs inform clients about their rate limit state using response headers:

Check Your Understanding

1. Which rate limiting algorithm is ideal if you want to handle sudden bursts of traffic gracefully without dropping requests immediately?
2. How are race conditions solved in high-throughput distributed rate limiters querying Redis?
3. What HTTP response status code must a rate limiter return when a client exceeds their quota?