← Course Index

Storage: Blob Storage & Object Stores

~20 min · Advanced Patterns · Alex Xu Vol 2, Ch 9

Ref
Primary Source
Alex Xu Vol 2, Chapter 9 — "Design an S3-like Object Storage System"

Covers how modern cloud providers build massive object stores (S3, GCS) separating metadata databases from binary data nodes.

What is Blob/Object Storage?

In system design, "blobs" (Binary Large Objects) are unstructured data like videos, images, PDFs, backups, and audio. Traditional filesystems or relational databases scale poorly for massive multi-terabyte files:

Architecture: Separation of Concerns

A key concept in object storage design is separating the Metadata (file name, owner, size, permissions) from the Actual Data (the raw bytes of the file).

Client API Server Metadata DB Data Nodes CDN (Reads) 1. Save Metadata 2. Save Raw Bytes Presigned upload (Direct)
Separate pathways for file metadata (database) vs raw binary storage nodes

Upload and Download Workflows

In standard web servers, routing large video/image files through your API instances consumes high bandwidth and memory. Instead, system designs use Presigned URLs:

1. Write / Upload Workflow
a) Client calls API: "Request upload for cat.mp4".
b) API validates permissions, generates a unique 
   file key, and requests a "Presigned URL" 
   from the storage cluster.
c) API returns the presigned URL to the client.
d) Client uploads the file directly to object 
   storage using the URL.
e) Storage node notifies API via webhook once done.
f) API updates Metadata DB status to 'Active'.
2. Read / Download Workflow
a) Client requests a resource: "Get video /123".
b) API queries Metadata DB to get file path/key.
c) If public or cached, API returns CDN URL.
d) Client fetches the file directly from CDN.
e) Cache miss? CDN fetches from Object Storage
   Data Node, caches it, and serves the client.
f) Keeps API servers lightweight and responsive.

Crucial Optimization Strategies

💡 Interview Tip: YouTube / Google Drive Uploads

When asked to design a file upload system, always mention: "I will bypass the web server for the payload. The client will get a presigned URL and upload the binary directly to S3/Object storage. I'll use multipart upload for resiliency."

Check Your Understanding

1. What is the main benefit of using a "Presigned URL" for uploading media assets?
2. How does Erasure Coding improve efficiency compared to simple replication in storage systems?
3. A client wants to upload a 5 GB video file. What design mechanism should be used?