Building Blocks Map

Which component to reach for, and when. The architect's toolkit.

🗄️ Data Storage

Relational Database (SQL)

When: Structured data with relationships, strong consistency, complex queries, ACID needed

The default until you have a reason to use something else. Handles most workloads up to a single-machine scale easily.

Examples: PostgreSQL, MySQL · Used in: URL shortener, notification system, chat (message metadata)

Document Store (NoSQL)

When: Flexible schema, nested data, read-heavy workloads, horizontal scaling needed

Good for denormalized data models where you query by a single key or simple predicates.

Examples: MongoDB, DynamoDB · Used in: user profiles, product catalogs, content management

Wide-Column Store

When: Massive write throughput, time-series data, analytics, append-heavy workloads

Optimized for writes and sequential reads. Data is organized by column families.

Examples: Cassandra, HBase · Used in: messaging at scale, activity feeds, IoT telemetry

Key-Value Store

When: Simple lookups by key, caching, session storage, real-time leaderboards

Fastest possible reads. No complex queries — just get/set by key.

Examples: Redis, Memcached, DynamoDB · Used in: caching layer, rate limiting counters, session store

Graph Database

When: Highly connected data, relationship traversal, social networks, recommendation engines

Optimized for queries like "friends of friends" that would require expensive JOINs in SQL.

Examples: Neo4j, Amazon Neptune · Used in: social graphs, fraud detection, knowledge graphs

Blob / Object Storage

When: Large unstructured files — images, videos, backups, logs

Cheap, infinitely scalable, immutable objects accessed by key. Not a database.

Examples: Amazon S3, Google Cloud Storage · Used in: YouTube (video files), Google Drive, image hosting

Search Engine

When: Full-text search, fuzzy matching, faceted filtering, typeahead

Inverted index-based. Complements your primary database — not a replacement.

Examples: Elasticsearch, Apache Solr · Used in: product search, log analysis, autocomplete

🌐 Networking & Routing

Load Balancer

When: Distributing traffic across multiple servers (always, once you have > 1 server)

L4 (TCP) for raw speed. L7 (HTTP) when you need content-aware routing, SSL termination, or path-based routing.

Examples: NGINX, HAProxy, AWS ALB/NLB · Used in: every multi-server system

CDN (Content Delivery Network)

When: Serving static content globally, reducing latency for geographically distributed users

Caches static assets at edge locations close to users. Pull-based (lazy) or push-based (pre-populate).

Examples: CloudFront, Cloudflare, Akamai · Used in: YouTube (video delivery), any media-heavy app

API Gateway

When: Centralizing cross-cutting concerns — auth, rate limiting, request routing, protocol translation

Single entry point for all client requests. Routes to appropriate backend services.

Examples: Kong, AWS API Gateway, NGINX · Used in: microservice architectures

DNS

When: Domain resolution (always) + geographic routing, failover, load distribution

The first hop of every request. Can be used for geographic load balancing (GeoDNS).

Examples: Route 53, Cloudflare DNS · Used in: every internet-facing system

⚡ Caching

Application Cache (Redis / Memcached)

When: Frequently accessed data, read-heavy workloads, reducing DB load

In-memory key-value store sitting between your app and database. Cache-aside is the most common pattern.

Examples: Redis, Memcached · Used in: news feed (precomputed timelines), session store, leaderboards

Client-Side Cache

When: Reducing round trips for data that changes infrequently

Browser cache, mobile app cache, DNS cache. Controlled via HTTP headers (Cache-Control, ETag).

CDN (as cache)

When: Static assets, pre-rendered pages, video segments

Acts as a distributed cache at the network edge.

📨 Messaging & Async

Message Queue

When: Decoupling producers from consumers, handling traffic spikes, async processing

Producer enqueues, consumer dequeues. Provides buffering, retry, and ordering guarantees.

Examples: Kafka (log-based), RabbitMQ (traditional), SQS (managed) · Used in: notification system, video processing pipeline

Pub/Sub

When: One-to-many delivery, event-driven architectures, real-time updates

Publishers broadcast to topics; multiple subscribers independently consume. Different from point-to-point queues.

Examples: Google Pub/Sub, Kafka topics, Redis Pub/Sub · Used in: real-time notifications, event sourcing

Task Queue / Workers

When: CPU-intensive or slow background jobs — image resizing, email sending, report generation

Distribute work across a pool of worker processes. Separate from request handling.

Examples: Celery, Sidekiq, AWS Lambda · Used in: video transcoding, batch analytics

🔐 Coordination & Consistency

Distributed Lock / Leader Election

When: Only one process should do something at a time (e.g., cron jobs, master election)

Use a coordination service. Be careful — distributed locks are hard to get right.

Examples: ZooKeeper, etcd, Redis (Redlock) · Used in: single-leader database replication, job scheduling

Consistent Hashing

When: Distributing data/load across a dynamic set of nodes with minimal redistribution

Hash ring + virtual nodes. Adding/removing a node only affects K/N keys.

Used in: cache clusters, database sharding, load balancers, CDN routing

Unique ID Generator

When: Globally unique, sortable identifiers across distributed systems

UUIDs (random, no ordering) vs Snowflake IDs (timestamp + machine + sequence, sortable). Choose based on whether you need time-ordering.

Examples: Twitter Snowflake, UUID v4/v7 · Used in: every system that creates entities at scale

📊 Decision Quick-Reference

🧱 This is your toolkit. Every case study in Phase 3 will pull from these building blocks. When you're in an interview, mentally scan this map: "Do I need a cache here? A queue? What type of database?"