Google ★★★ Frequent Hard Inverted IndexSearchRanking

G7 · Design Google Search Index (Inverted Index) G7 · 设计 Google 搜索索引(倒排索引)

Verified source经核实出处

Google-infra classic. Asked in Search team onsites. Credibility A.

Key decisions关键决策

  • **Shard by doc**: each shard is a full mini-index; queries scatter/gather.**按 doc 分片**:每 shard 独立小索引;查询 scatter/gather。
  • **Tiered storage**: hot tier RAM+SSD; cold tier on disk. Caffeine-style incremental updates.**分级存储**:热层 RAM+SSD;冷层磁盘。Caffeine 式增量更新。
  • **Query planner** decides AND/OR/phrase; prunes with posting skip pointers.**查询规划器**决定 AND/OR/phrase;用跳表修剪 posting。
  • **Two-phase ranking**: cheap BM25 retrieval of top-K, then expensive learned re-ranker.**两阶段排序**:廉价 BM25 取 top-K,然后学习式 re-rank。

Follow-ups追问

  • Freshness vs scale? separate fresh-doc index merged into main on schedule.新鲜度 vs 规模?独立 fresh-doc 索引定期并入主索引。

Related study-guide topics相关学习手册专题