← Course Index

Design Google Drive / Dropbox

~20 min · Case Studies · Alex Xu Vol 1, Ch 15

Ref
Primary Source
Alex Xu Vol 1, Chapter 15 — "Design Google Drive"

Covers designing a highly consistent file storage system, chunk-level synchronization, deduplication strategies, and client sync states.

What is Google Drive / Dropbox?

A cloud file storage and synchronization service allows users to upload, edit, and sync files across multiple devices. The key design challenges are optimizing network bandwidth and managing write storage efficiently.

Core Storage Pattern: Chunking & Deduplication

If we upload entire files on every edit, we waste massive amounts of network bandwidth and disk storage. We solve this using **Chunking** and **Deduplication**:

report.pdf (12MB) Block A (4MB) Block B (4MB) Block C (4MB) Deduplication Check Block Store (S3) Metadata Database Upload new chunk Link block hashes
Chunking & Deduplication: Files are split into blocks, hashed, checked for duplication, and saved in block storage

Data Structure & Synchronization

A file sync system separates the data into two locations:

Sync conflict resolution:

If two users edit the same file offline and sync at the same time:
• The first user to sync succeeds, updating the file version from V1 to V2 in the metadata DB.
• The second user's sync fails because their local state expects V1 but the server is now at V2.
• The conflict service prompts the second user, creating a local duplicate copy (e.g. report (Conflict).pdf) to allow manual merge.

Check Your Understanding

1. In a file sync service, what is the main benefit of block-level chunking?
2. How does block-level deduplication optimize disk storage?
3. When a sync conflict occurs because two users updated the same file concurrently, how does Google Drive/Dropbox resolve it?