← Course Index

Design a Rate Limiter

~25 min · Case Studies · Alex Xu Vol 1, Ch 4

Ref

Primary Source

Alex Xu Vol 1, Chapter 4 — "Design a Rate Limiter" (Deep Dive)

Presents a complete end-to-end case study on building a production-grade distributed rate limiter.

Step 1: Clarify Requirements

In the interview, establish the scope immediately:

Functional Requirements:
- Limit requests by user ID, IP address, or API key.
- Return clear HTTP headers indicating limits.
- Reject requests with HTTP 429 Too Many Requests.
Non-Functional Requirements:
- Low Latency: The check must add minimal overhead (< 2–5 ms) to the request path.
- Fault Tolerance: If the rate limiter cluster crashes, it must not take down the entire API (fail open).
- Accuracy: Avoid race conditions where clients bypass limits.

Step 2: Database Schema & Rules Storage

Rate limiting rules are typically written in configuration files (YAML) and loaded into server memory or cached in a relational database for management.

# Example Rate Limiting Rules (YAML)
domain: auth_service
rules:
  - key: user_id
    endpoint: /login
    rate_limit: 5/minute
  - key: ip_address
    endpoint: /signup
    rate_limit: 2/minute

Step 3: Atomic Checks with Redis Lua Scripting

As covered in Lesson 16, race conditions happen when checking and updating the token count are split into separate operations. We solve this using **Redis Lua scripts** which run as a single atomic unit.

Token Bucket Lua Script

-- KEYS[1]: User/IP rate limit key
-- ARGV[1]: Max tokens capacity
-- ARGV[2]: Fill rate (tokens/sec)
-- ARGV[3]: Request cost (usually 1)
-- ARGV[4]: Current timestamp (seconds)

local key = KEYS[1]
local limit = tonumber(ARGV[1])
local rate = tonumber(ARGV[2])
local cost = tonumber(ARGV[3])
local now = tonumber(ARGV[4])

-- Load bucket state
local state = redis.call('HMGET', key, 'tokens', 'last_updated')
local tokens = tonumber(state[1])
local last_updated = tonumber(state[2])

if not tokens then
    -- Init bucket
    tokens = limit
    last_updated = now
else
    -- Add refilled tokens
    local elapsed = now - last_updated
    tokens = math.min(limit, tokens + elapsed * rate)
    last_updated = now
end

-- Allow request?
if tokens >= cost then
    tokens = tokens - cost
    redis.call('HMSET', key, 'tokens', tokens, 'last_updated', last_updated)
    redis.call('EXPIRE', key, 3600) -- Clean up after 1 hr
    return 1 -- Allowed
else
    return 0 -- Rejected
end

How it runs in Redis

The script is uploaded to Redis once:
SCRIPT LOAD "..."
→ Returns sha1_hash

For every incoming API request:
EVALSHA sha1_hash 1 "rate:ip_1.2.3.4" 10 0.5 1 1719211000

Advantages:
1. One network round-trip.
2. Completely atomic — Redis is single-threaded, 
   so no other request can modify token counts 
   in the middle of execution.

Step 4: Handling Failures (Fail-Open vs Fail-Closed)

What happens if the Redis cluster becomes unavailable or slow? Your rate limiter middleware must handle this exception gracefully.

Fail-Open (Preferred): If the rate limiter check throws an error (e.g., Redis timeout), log the warning, and let the request proceed to the application servers. Under normal SRE guidelines, it's better to risk overloading the service slightly than denying service to 100% of legitimate users due to a monitoring failure.
Fail-Closed: Reject all incoming traffic if the rate limiter goes down. Only used for highly sensitive billing endpoints where fraud risks are extreme.

💡 Client Integration: Exponential Backoff & Jitter

When a client receives a 429 status, they should retry using Exponential Backoff with Jitter. Instead of retrying every 1 second (which creates a retry storm that crushes your database), clients double their wait time on each attempt and add a random delay (jitter) to distribute the load.

Check Your Understanding

1. In a production rate limiter, why is a "Fail-Open" strategy typically chosen when the Redis cache becomes unreachable?

2. How does using a Redis Lua script prevent race conditions?

3. Why should client SDKs include "jitter" (random noise) along with exponential backoff when retrying failed requests?