~25 min · Foundations · DDIA §2–3 · Alex Xu Vol 1, Ch 1
Ref
Primary Source
DDIA — Chapters 2 & 3: Data Models and Storage Engines
Kleppmann's treatment of why different data models exist is the best available. Particularly read the section on document vs relational models.
The Golden Rule
Interview rule
Default to a relational database (SQL) unless you have a specific reason not to. SQL databases handle most workloads, are well-understood, and have strong consistency guarantees. Use NoSQL when you have a concrete, justified reason: massive scale, flexible schema, specific data access patterns.
The Four NoSQL Data Models
The four NoSQL data models — each optimized for a different access pattern
SQL vs NoSQL — When to Use Each
Factor
SQL (Relational)
NoSQL
Data structure
Structured, tabular, fixed schema
Flexible, nested, variable schema
Relationships
Strong — JOINs, foreign keys
Weak — denormalization required
Consistency
ACID — strong consistency
Varies — often eventual
Scale
Vertical + limited horizontal (read replicas)
Horizontal by design
Query flexibility
Arbitrary SQL queries
Limited to designed access patterns
Write throughput
Good (up to 10K-100K TPS with tuning)
Excellent (millions of writes/sec in Cassandra)
Operational complexity
Well understood, mature tooling
Varies — Cassandra tuning is hard
Choose SQL When
You have relational data with many entity types and relationships
You need ACID transactions (financial records, inventory)
Your access patterns are diverse and unpredictable (ad-hoc queries)
You're starting a new project (default to relational)
You need to scale horizontally to handle massive write throughput
Your data is naturally a document/graph/key-value (e.g., user profiles, social graphs)
Your schema evolves frequently (e.g., product catalog with varying attributes)
You have a very specific access pattern and don't need SQL's flexibility
Eventual consistency is acceptable for your use case
Denormalization — The NoSQL Trade-off
SQL normalizes data to eliminate redundancy. NoSQL denormalizes — duplicates data to optimize for read performance. With Cassandra or DynamoDB, you design your data model around your query patterns, not around entity relationships.
SQL — Normalized
users table: (user_id, name, email)
posts table: (post_id, user_id, content)
likes table: (post_id, user_id)
To get a post with author info:
SELECT posts.*, users.name
FROM posts JOIN users ON posts.user_id = users.user_id
WHERE post_id = 123
Duplication in NoSQL is a feature, not a bug. The cost: updating author names requires updating all their posts. This is the write amplification trade-off.
Check Your Understanding
1. You're building a social network where you need to find "friends of friends" efficiently. Which database type is the best fit?
2. A payment processor needs to ensure that a money transfer is either fully completed or fully rolled back — never partial. Which property ensures this?
3. You're designing a message inbox for a chat app. Each user reads their own messages frequently. Write throughput is high (millions of messages/day). Which database fits best?
🎓 The SQL vs NoSQL question comes up in every interview. Ask me to walk through the database choice for any specific system — URL shortener, Twitter, YouTube — and I'll explain the reasoning.