โ† Back to Course Index

Glossary

The canonical language of this course. Every term, one place.

Scalability & Architecture

Horizontal Scaling (Scale Out)
Adding more machines to a pool to handle increased load. Each machine handles a fraction of the work.
Avoid: scaling sideways, adding servers
Vertical Scaling (Scale Up)
Adding more resources (CPU, RAM, disk) to a single machine. Simpler but has a hard ceiling.
Avoid: making the server bigger
Throughput
The number of operations a system can handle per unit of time, typically measured in QPS (queries per second) or TPS (transactions per second).
Latency
The time between a client sending a request and receiving the response. Measured in milliseconds. Distinct from response time, which includes queuing delays.
Avoid: speed, delay (too vague)
QPS (Queries Per Second)
The rate of incoming requests a system processes. The standard unit for measuring throughput in system design interviews.
Fan-out
The number of downstream requests a single upstream request triggers. "Fan-out on write" pushes data to many at write time; "fan-out on read" assembles it at read time.
See also: News Feed design
Single Point of Failure (SPOF)
A component whose failure brings down the entire system. Eliminating SPOFs is a core goal of distributed system design.

Data & Consistency

CAP Theorem
In a distributed system, you can guarantee at most two of three: Consistency, Availability, Partition tolerance. Since network partitions are unavoidable, the real choice is CP vs AP.
Strong Consistency
Every read returns the most recent write. All nodes see the same data at the same time. Higher latency, but no stale reads.
Avoid: perfect consistency, real-time consistency
Eventual Consistency
Given enough time with no new writes, all replicas converge to the same value. Reads may return stale data temporarily. Lower latency, higher availability.
ACID
Properties of relational database transactions: Atomicity (all or nothing), Consistency (valid state), Isolation (concurrent transactions don't interfere), Durability (committed = permanent).
BASE
NoSQL trade-off model: Basically Available, Soft state, Eventually consistent. The opposite end of the spectrum from ACID.
Sharding (Horizontal Partitioning)
Splitting data across multiple databases, each holding a subset. The shard key determines which shard a record lives on.
Avoid: splitting the database
Replication
Keeping copies of the same data on multiple nodes. Leader-follower: one node accepts writes, others replicate. Multi-leader: multiple nodes accept writes.
Quorum
The minimum number of nodes that must agree for an operation to succeed. For N replicas: write quorum W + read quorum R > N guarantees consistency.
Write-Ahead Log (WAL)
An append-only log where every write is recorded before being applied. Enables crash recovery and replication.

Networking & Infrastructure

Load Balancer
Distributes incoming requests across a pool of servers. L4 operates at TCP level; L7 operates at HTTP level and can make content-aware routing decisions.
Reverse Proxy
A server that sits in front of backend servers, forwarding client requests. Provides SSL termination, caching, compression, and security. Load balancers are a type of reverse proxy.
CDN (Content Delivery Network)
A geographically distributed network of servers that caches static content close to users. Reduces latency for static assets (images, CSS, JS, video).
DNS (Domain Name System)
Translates human-readable domain names into IP addresses. The first step of every web request.
WebSocket
A persistent, full-duplex communication channel over a single TCP connection. Used when the server needs to push data to the client (chat, notifications, live updates).
Avoid: two-way socket, real-time connection
API Gateway
A single entry point for all client requests that handles cross-cutting concerns: authentication, rate limiting, request routing, protocol translation.

Caching

Cache-Aside (Lazy Loading)
Application checks cache first. On miss, reads from DB, writes result to cache, then returns. Most common pattern.
Write-Through
Every write goes to both cache and DB simultaneously. Ensures cache is always consistent but adds write latency.
Write-Back (Write-Behind)
Writes go to cache only; cache asynchronously flushes to DB. Fast writes, but risk of data loss if cache crashes before flush.
Cache Invalidation
The process of removing or updating stale cache entries. Famously one of the two hard problems in computer science.
Eviction Policy
Strategy for removing entries when cache is full. Common: LRU (least recently used), LFU (least frequently used), TTL (time to live).

Messaging & Async

Message Queue
A buffer that decouples producers from consumers. Producers enqueue messages; consumers dequeue and process them asynchronously.
Avoid: job queue (more specific), event bus (different pattern)
Pub/Sub (Publish-Subscribe)
A messaging pattern where publishers send messages to topics, and subscribers receive all messages on topics they've subscribed to. One-to-many delivery.
Backpressure
A mechanism for consumers to signal producers to slow down when overwhelmed. Prevents cascading failures from traffic spikes.
Idempotency
An operation is idempotent if performing it multiple times produces the same result as performing it once. Critical for retry safety in distributed systems.

Algorithms & Data Structures

Consistent Hashing
A hashing scheme where adding or removing a node only requires remapping K/N keys (K = total keys, N = total nodes). Uses a logical hash ring with virtual nodes for balance.
See also: Lesson 09
Bloom Filter
A space-efficient probabilistic data structure that tests set membership. Can have false positives but never false negatives. Used for deduplication, cache filtering.
Merkle Tree
A tree of hashes where each parent node is the hash of its children. Enables efficient detection of inconsistencies between replicas by comparing root hashes.
Trie (Prefix Tree)
A tree where each node represents a character, and paths from root to nodes form prefixes. The standard data structure for autocomplete and prefix-based search.
Geohash
A string encoding of latitude/longitude that converts 2D coordinates into a 1D string. Nearby locations share common prefixes, enabling efficient spatial queries.

Reliability & Operations

SLA / SLO / SLI
SLI = metric you measure (e.g., p99 latency). SLO = target value for that metric (e.g., p99 < 200ms). SLA = contractual commitment with consequences if SLO is breached.
Availability
The percentage of time a system is operational. Measured in "nines": 99.9% = 8.76 hours downtime/year, 99.99% = 52.6 minutes/year.
Gossip Protocol
A peer-to-peer protocol where nodes periodically exchange state with random peers. Used for failure detection and state propagation in decentralized systems.
Heartbeat
A periodic signal sent between nodes to confirm liveness. If heartbeats stop, the node is presumed failed.
Circuit Breaker
A pattern that stops calling a failing service after a threshold of failures, preventing cascading failures. Transitions: Closed โ†’ Open โ†’ Half-Open.
๐Ÿ“ This glossary grows with you. As you complete lessons, new terms will be added here. If a definition feels wrong later, that's learning โ€” we'll update it.