Databases: SQL vs NoSQL

~25 min · Foundations · DDIA §2–3 · Alex Xu Vol 1, Ch 1

Ref

Primary Source

DDIA — Chapters 2 & 3: Data Models and Storage Engines

Kleppmann's treatment of why different data models exist is the best available. Particularly read the section on document vs relational models.

The Golden Rule

Interview rule

Default to a relational database (SQL) unless you have a specific reason not to. SQL databases handle most workloads, are well-understood, and have strong consistency guarantees. Use NoSQL when you have a concrete, justified reason: massive scale, flexible schema, specific data access patterns.

The Four NoSQL Data Models

The four NoSQL data models — each optimized for a different access pattern

SQL vs NoSQL — When to Use Each

Factor	SQL (Relational)	NoSQL
Data structure	Structured, tabular, fixed schema	Flexible, nested, variable schema
Relationships	Strong — JOINs, foreign keys	Weak — denormalization required
Consistency	ACID — strong consistency	Varies — often eventual
Scale	Vertical + limited horizontal (read replicas)	Horizontal by design
Query flexibility	Arbitrary SQL queries	Limited to designed access patterns
Write throughput	Good (up to 10K-100K TPS with tuning)	Excellent (millions of writes/sec in Cassandra)
Operational complexity	Well understood, mature tooling	Varies — Cassandra tuning is hard

Choose SQL When

You have relational data with many entity types and relationships
You need ACID transactions (financial records, inventory)
Your access patterns are diverse and unpredictable (ad-hoc queries)
You're starting a new project (default to relational)
Data is structured and schema is stable

Examples: PostgreSQL, MySQL, SQLite, Amazon Aurora

Choose NoSQL When

You need to scale horizontally to handle massive write throughput
Your data is naturally a document/graph/key-value (e.g., user profiles, social graphs)
Your schema evolves frequently (e.g., product catalog with varying attributes)
You have a very specific access pattern and don't need SQL's flexibility
Eventual consistency is acceptable for your use case

Denormalization — The NoSQL Trade-off

SQL normalizes data to eliminate redundancy. NoSQL denormalizes — duplicates data to optimize for read performance. With Cassandra or DynamoDB, you design your data model around your query patterns, not around entity relationships.

SQL — Normalized

users table: (user_id, name, email)
posts table: (post_id, user_id, content)
likes table: (post_id, user_id)

To get a post with author info:
SELECT posts.*, users.name
FROM posts JOIN users ON posts.user_id = users.user_id
WHERE post_id = 123

NoSQL — Denormalized

// MongoDB document
{
  _id: 123,
  content: "Hello world",
  author_name: "Rax",      // duplicated!
  author_avatar: "...",    // duplicated!
  like_count: 42
}

// No JOIN needed — one read gets everything

Duplication in NoSQL is a feature, not a bug. The cost: updating author names requires updating all their posts. This is the write amplification trade-off.

Check Your Understanding

1. You're building a social network where you need to find "friends of friends" efficiently. Which database type is the best fit?

2. A payment processor needs to ensure that a money transfer is either fully completed or fully rolled back — never partial. Which property ensures this?

3. You're designing a message inbox for a chat app. Each user reads their own messages frequently. Write throughput is high (millions of messages/day). Which database fits best?

🎓 The SQL vs NoSQL question comes up in every interview. Ask me to walk through the database choice for any specific system — URL shortener, Twitter, YouTube — and I'll explain the reasoning.