Alex Xu Vol 2, Chapter 9 — "Design an S3-like Object Storage System"
Covers how modern cloud providers build massive object stores (S3, GCS) separating metadata databases from binary data nodes.
What is Blob/Object Storage?
In system design, "blobs" (Binary Large Objects) are unstructured data like videos, images, PDFs, backups, and audio. Traditional filesystems or relational databases scale poorly for massive multi-terabyte files:
File Storage: Hierarchical folders. Becomes slow and difficult to manage with billions of files.
Block Storage: Raw disk blocks (e.g. SAN, database storage). Extremely fast for databases, but expensive and rigid.
Object Storage: Flat structure. Each file is an "object" with a unique ID, content, and metadata, accessible via simple HTTP APIs (like AWS S3). It is highly scalable and cost-effective.
Architecture: Separation of Concerns
A key concept in object storage design is separating the Metadata (file name, owner, size, permissions) from the Actual Data (the raw bytes of the file).
Separate pathways for file metadata (database) vs raw binary storage nodes
Upload and Download Workflows
In standard web servers, routing large video/image files through your API instances consumes high bandwidth and memory. Instead, system designs use Presigned URLs:
1. Write / Upload Workflow
a) Client calls API: "Request upload for cat.mp4".
b) API validates permissions, generates a unique
file key, and requests a "Presigned URL"
from the storage cluster.
c) API returns the presigned URL to the client.
d) Client uploads the file directly to object
storage using the URL.
e) Storage node notifies API via webhook once done.
f) API updates Metadata DB status to 'Active'.
2. Read / Download Workflow
a) Client requests a resource: "Get video /123".
b) API queries Metadata DB to get file path/key.
c) If public or cached, API returns CDN URL.
d) Client fetches the file directly from CDN.
e) Cache miss? CDN fetches from Object Storage
Data Node, caches it, and serves the client.
f) Keeps API servers lightweight and responsive.
Crucial Optimization Strategies
Multipart Uploads: Large files (e.g. >100MB) should be split into smaller chunks and uploaded in parallel. If one chunk fails, only that chunk is retried. Once all chunks finish, the storage service reassembles them.
Metadata DB Choice: For extreme scale, metadata databases must support fast queries on files. Typically relational databases (like PostgreSQL) are used for structure and ACID, or NoSQL key-value/document DBs (like Cassandra or DynamoDB) if lookup patterns are simple.
Data Replication (Erasure Coding): Rather than maintaining simple 3x raw copies of data (which is expensive), modern storage services use Erasure Coding. They split files into M data blocks and N parity blocks. Any M of these can reconstruct the original file. This reduces storage overhead by up to 50% while maintaining durability.
💡 Interview Tip: YouTube / Google Drive Uploads
When asked to design a file upload system, always mention: "I will bypass the web server for the payload. The client will get a presigned URL and upload the binary directly to S3/Object storage. I'll use multipart upload for resiliency."
Check Your Understanding
1. What is the main benefit of using a "Presigned URL" for uploading media assets?
2. How does Erasure Coding improve efficiency compared to simple replication in storage systems?
3. A client wants to upload a 5 GB video file. What design mechanism should be used?