Rate Limiting

~25 min · Advanced Patterns · Alex Xu Vol 1, Ch 4

Ref

Primary Source

Alex Xu Vol 1, Chapter 4 — "Design a Rate Limiter"

Explores the mathematical algorithms behind limiters and how to build a high-performance distributed rate limiter using Redis.

What is Rate Limiting?

A rate limiter controls the rate of traffic sent by a client or service. If a client exceeds the limit, the server rejects their request with an HTTP status code 429 Too Many Requests. This protects backends from:

Abuse & DDoS attacks: Stopping malicious actors from flooding services.
Cost Control: Limiting calls to expensive third-party APIs (like OpenAI or Stripe).
Resource Starvation: Ensuring one user doesn't hog all database connections or memory.

Core Rate Limiting Algorithms

Different use cases call for different algorithms. Here are the 5 classic algorithms:

Algorithm	How it Works	Pros / Cons
Token Bucket Used by Amazon, Stripe	A bucket holds tokens. Tokens are added at a constant rate. Each request takes a token. If empty, request is dropped.	✓ Handles short bursts of traffic ✓ Simple to implement in Redis
Leaky Bucket Used by Shopify	Requests enter a queue. The queue drains (processes requests) at a constant rate. If queue fills, requests drop.	✓ Guarantees smooth, stable output rate ✗ Bursts are delayed rather than instant
Fixed Window Counter	Divides time into windows (e.g., 1 min). Tracks request counts per window. Resets count at start of next window.	✗ Traffic spike at boundaries can double the allowed limit (e.g., 200 requests in 2 seconds)
Sliding Window Log	Keeps timestamps of all client requests in a sorted set. Removes expired timestamps. Count is size of set.	✗ Memory-intensive: stores every single timestamp even if request is blocked
Sliding Window Counter	Combines fixed window and sliding window. Estimates rate by computing weight of current and previous window counts.	✓ Low memory footprint, smooths out boundary spikes

Distributed Rate Limiter Architecture

In a distributed system, a client's requests hit different API servers. Therefore, counters must be stored in a centralized, ultra-low-latency cache like Redis rather than local server memory.

Distributed Rate Limiter: API Gateway checks Redis tokens before passing traffic to internal microservices

Two Key Distributed Challenges:

Race Conditions: If two requests check Redis at the same time, both might see 1 token remaining, accept the request, and decrement the counter, allowing a duplicate request through.
Solution: Use **Lua Scripts** or Redis sorted sets with transactions. Lua scripts run atomically in Redis, preventing race conditions.
Synchronization Latency: If multiple gateways run across different regions, querying a central Redis adds latency.
Solution: Use local in-memory buffers in the API gateways that sync asynchronously with Redis in batches, or deploy regional Redis replica systems.

Rate Limiter Headers

Good APIs inform clients about their rate limit state using response headers:

X-RateLimit-Limit: The maximum number of allowed requests in the time window.
X-RateLimit-Remaining: The remaining number of allowed requests in the current window.
X-RateLimit-Reset: The Unix epoch time when the rate limit window resets.
Retry-After: The number of seconds the client must wait before making another request (sent on 429 errors).

Check Your Understanding

1. Which rate limiting algorithm is ideal if you want to handle sudden bursts of traffic gracefully without dropping requests immediately?

2. How are race conditions solved in high-throughput distributed rate limiters querying Redis?

3. What HTTP response status code must a rate limiter return when a client exceeds their quota?