Design Search Autocomplete

~25 min · Case Studies · Alex Xu Vol 1, Ch 13

Ref

Primary Source

Alex Xu Vol 1, Chapter 13 — "Design Search Autocomplete System"

Explores prefix trees (Tries), in-memory query service optimizations, and MapReduce analytics loops to rebuild search data.

What is Search Autocomplete?

As you type in a search box (like Google or Amazon), suggestions appear. In interviews, design for:

Low Latency: Suggestions must appear in **< 100 ms** to feel real-time. If it takes longer, it lags behind typing.
Relevance: Suggestions must correspond to popular search queries.
Scale: Hundreds of thousands of query requests per second.

Core Data Structure: The Trie

A **Trie** (prefix tree) is a tree-like data structure used to store strings. Each node represents a character. The path from the root to a node forms a word prefix.

A Prefix Trie: Nodes represent characters, and each node caches precomputed top search matches

The Prefix Query Bottleneck:

If a client types "c", a basic Trie query has to:
1. Locate the "c" prefix node.
2. Traverse all child subtrees below "c" to find all complete words.
3. Sort the words by query frequency weights to return the top K (e.g. top 5).
This traversal is O(Number of nodes in subtree), which is far too slow for real-time.

The Optimization: Cache Top K at Each Node

To achieve O(1) prefix lookups, **we precompute and store the top K suggestions directly inside each node**.
• The node representing "c" stores: ["code": 98K, "cool": 85K, "cars": 77K].
• The query engine simply navigates to the prefix node and immediately returns the cached array. This completely avoids traversing the subtree.

Data Collection Pipeline

Search trends change. However, queries don't need to be aggregated in real-time. Rebuilding the Trie structure on every keypress causes unnecessary write locks. Instead, we divide the system into a **Query Service** and an offline **Data Collection Pipeline**.

1. Real-Time Query Service

a) Client types "sy".
b) API Gateway routes to Query Service.
c) Service checks Redis Cache.
   Redis Hit? Return suggestions.
d) Redis Miss? Query Trie Cache (in-memory).
e) Return suggestions to client.
* Time complexity: O(1)

2. Offline Collection & Build

a) Log Service records search events.
b) Aggregation Service runs MapReduce 
   jobs weekly to aggregate query counts.
c) Trie Builder creates a new Trie on disk 
   with updated frequency weights.
d) Store Trie files in MongoDB/S3.
e) Trie Cache servers hot-reload 
   the new Trie files.

Further Optimizations

Browser Caching: Cache suggestions in the browser (e.g. localStorage or memory cache) for 1–2 hours. The browser only calls the server if the cached prefix results are expired.
Trie Partitioning: If the Trie becomes too large for a single machine's RAM, shard it alphabetically. For example, queries starting with a–m go to Server A, and n–z go to Server B.

Check Your Understanding

1. In an autocomplete system, how do we prevent having to traverse the entire Trie subtree at query time?

2. Why does the data collection service aggregate query frequencies offline (weekly/nightly) instead of updates in real-time?

3. If the autocomplete Trie is too large to fit in a single cache server's memory, how is it scaled?