G7 · Design Google Search Index (Inverted Index) G7 · 设计 Google 搜索索引(倒排索引)
Verified source经核实出处
Google-infra classic. Asked in Search team onsites. Credibility A.
Key decisions关键决策
- **Shard by doc**: each shard is a full mini-index; queries scatter/gather.**按 doc 分片**:每 shard 独立小索引;查询 scatter/gather。
- **Tiered storage**: hot tier RAM+SSD; cold tier on disk. Caffeine-style incremental updates.**分级存储**:热层 RAM+SSD;冷层磁盘。Caffeine 式增量更新。
- **Query planner** decides AND/OR/phrase; prunes with posting skip pointers.**查询规划器**决定 AND/OR/phrase;用跳表修剪 posting。
- **Two-phase ranking**: cheap BM25 retrieval of top-K, then expensive learned re-ranker.**两阶段排序**:廉价 BM25 取 top-K,然后学习式 re-rank。
Follow-ups追问
- Freshness vs scale? separate fresh-doc index merged into main on schedule.新鲜度 vs 规模?独立 fresh-doc 索引定期并入主索引。