Explores prefix trees (Tries), in-memory query service optimizations, and MapReduce analytics loops to rebuild search data.
As you type in a search box (like Google or Amazon), suggestions appear. In interviews, design for:
A **Trie** (prefix tree) is a tree-like data structure used to store strings. Each node represents a character. The path from the root to a node forms a word prefix.
If a client types "c", a basic Trie query has to:
1. Locate the "c" prefix node.
2. Traverse all child subtrees below "c" to find all complete words.
3. Sort the words by query frequency weights to return the top K (e.g. top 5).
This traversal is O(Number of nodes in subtree), which is far too slow for real-time.
To achieve O(1) prefix lookups, **we precompute and store the top K suggestions directly inside each node**.
• The node representing "c" stores: ["code": 98K, "cool": 85K, "cars": 77K].
• The query engine simply navigates to the prefix node and immediately returns the cached array. This completely avoids traversing the subtree.
Search trends change. However, queries don't need to be aggregated in real-time. Rebuilding the Trie structure on every keypress causes unnecessary write locks. Instead, we divide the system into a **Query Service** and an offline **Data Collection Pipeline**.
a) Client types "sy".
b) API Gateway routes to Query Service.
c) Service checks Redis Cache.
Redis Hit? Return suggestions.
d) Redis Miss? Query Trie Cache (in-memory).
e) Return suggestions to client.
* Time complexity: O(1)
a) Log Service records search events.
b) Aggregation Service runs MapReduce
jobs weekly to aggregate query counts.
c) Trie Builder creates a new Trie on disk
with updated frequency weights.
d) Store Trie files in MongoDB/S3.
e) Trie Cache servers hot-reload
the new Trie files.
localStorage or memory cache) for 1–2 hours. The browser only calls the server if the cached prefix results are expired.a–m go to Server A, and n–z go to Server B.