← Course Index

Databases: SQL vs NoSQL

~25 min · Foundations · DDIA §2–3 · Alex Xu Vol 1, Ch 1

Ref
Primary Source
DDIA — Chapters 2 & 3: Data Models and Storage Engines

Kleppmann's treatment of why different data models exist is the best available. Particularly read the section on document vs relational models.

The Golden Rule

Interview rule

Default to a relational database (SQL) unless you have a specific reason not to. SQL databases handle most workloads, are well-understood, and have strong consistency guarantees. Use NoSQL when you have a concrete, justified reason: massive scale, flexible schema, specific data access patterns.

The Four NoSQL Data Models

Key-Value Redis, DynamoDB "user:123" → {name: "Rax"} ✓ Fastest reads/writes ✓ Simple access pattern ✗ No complex queries ✗ No relationships Document MongoDB, Firestore { _id: 123, name: "Rax", tags: ["js","py"] } ✓ Flexible schema ✓ Nested data ✗ Weak JOIN support Wide-Column Cassandra, HBase row_key | col_family user123 | profile: name user123 | stats: posts ✓ Massive write scale ✓ Horizontal partition ✗ Limited query patterns ✗ No ad-hoc queries Graph Neo4j, Amazon Neptune User A User B User C ✓ Relationship traversal ✓ "Friends of friends" ✗ Specialized use case
The four NoSQL data models — each optimized for a different access pattern

SQL vs NoSQL — When to Use Each

FactorSQL (Relational)NoSQL
Data structureStructured, tabular, fixed schemaFlexible, nested, variable schema
RelationshipsStrong — JOINs, foreign keysWeak — denormalization required
ConsistencyACID — strong consistencyVaries — often eventual
ScaleVertical + limited horizontal (read replicas)Horizontal by design
Query flexibilityArbitrary SQL queriesLimited to designed access patterns
Write throughputGood (up to 10K-100K TPS with tuning)Excellent (millions of writes/sec in Cassandra)
Operational complexityWell understood, mature toolingVaries — Cassandra tuning is hard

Choose SQL When

Examples: PostgreSQL, MySQL, SQLite, Amazon Aurora

Choose NoSQL When

Denormalization — The NoSQL Trade-off

SQL normalizes data to eliminate redundancy. NoSQL denormalizes — duplicates data to optimize for read performance. With Cassandra or DynamoDB, you design your data model around your query patterns, not around entity relationships.

SQL — Normalized
users table: (user_id, name, email)
posts table: (post_id, user_id, content)
likes table: (post_id, user_id)

To get a post with author info:
SELECT posts.*, users.name
FROM posts JOIN users ON posts.user_id = users.user_id
WHERE post_id = 123
NoSQL — Denormalized
// MongoDB document
{
  _id: 123,
  content: "Hello world",
  author_name: "Rax",      // duplicated!
  author_avatar: "...",    // duplicated!
  like_count: 42
}

// No JOIN needed — one read gets everything

Duplication in NoSQL is a feature, not a bug. The cost: updating author names requires updating all their posts. This is the write amplification trade-off.

Check Your Understanding

1. You're building a social network where you need to find "friends of friends" efficiently. Which database type is the best fit?
2. A payment processor needs to ensure that a money transfer is either fully completed or fully rolled back — never partial. Which property ensures this?
3. You're designing a message inbox for a chat app. Each user reads their own messages frequently. Write throughput is high (millions of messages/day). Which database fits best?

🎓 The SQL vs NoSQL question comes up in every interview. Ask me to walk through the database choice for any specific system — URL shortener, Twitter, YouTube — and I'll explain the reasoning.