Explains how full-text inverted indexes work under the hood and explores spatial indexing algorithms like Geohash and Quadtrees.
If you have a database of millions of products or posts, running a relational search like SELECT * FROM posts WHERE content LIKE '%system design%' is highly inefficient. It triggers a full table scan, loading every record from disk to memory to evaluate the string matching, causing high CPU load and latency.
Similarly, for finding nearby drivers (like Uber) or restaurants (like Yelp), running a 2D range query like WHERE lat BETWEEN x1 AND x2 AND lng BETWEEN y1 AND y2 is slow because traditional database indexes are one-dimensional B-Trees.
An **Inverted Index** is a data structure mapping words (terms) to their locations in document files. This is the core technology behind search engines like Elasticsearch and Apache Solr.
Doc 1: "System design is fun"
Doc 2: "Learn system architecture"
Doc 3: "Fun scaling patterns"
Slow to search: must scan every
document to find who contains "Learn".
"architecture" → [Doc 2]
"design" → [Doc 1, Doc 2]
"fun" → [Doc 1, Doc 3]
"learn" → [Doc 2]
"scaling" → [Doc 3]
"system" → [Doc 1, Doc 2]
To search geospatial coordinate data efficiently, we need a way to index 2D points (latitude, longitude) into a 1D indexable key. Two common methods are Geohashes and Quadtrees.
| Geospatial Pattern | How it Works | Best For |
|---|---|---|
| Geohash | Divides the Earth into a grid, assigning a Base32 string label to each grid. Longer string prefix = higher resolution (e.g. 9q8yy). |
Easy caching, simple database indexes (finding matching string prefixes), distributed scaling. |
| Quadtree | An in-memory tree structure. A node splits into 4 child quadrants when the number of points (users/cars) in its area exceeds a threshold. | Dynamic grids (e.g., highly populated cities have dense node splits, rural areas are represented by a single large node). Good for Uber/Lyft. |
| Google S2 | Maps 2D coordinates to a 1D Hilbert space-filling curve. | Highly complex coverage math, polygon intersections (like Uber geofencing). |
For a Yelp-like system (restaurants are static, locations don't change), Geohashes stored in standard databases are perfect. For an Uber-like system (drivers are constantly moving, dynamic density), an in-memory Quadtree cluster is a much stronger design recommendation.