OpenAI ★★ Frequent Hard ANNHNSWSharding

O18 · Design a Vector Database O18 · 设计向量数据库

Verified source经核实出处

Prompt: "Design a Vector Database." — Jobright. Credibility C.

Problem size规模

Billions of 768-1536 dim vectors, sub-100ms query latency, filtered by metadata (tenant, date, source).数十亿 768-1536 维向量,查询延迟 < 100ms,支持 metadata 过滤(租户、日期、来源)。

ANN index familiesANN 索引家族

  • HNSW: graph-based, best recall/speed trade-off, higher memory.HNSW:基于图,召回/速度权衡最佳,内存更高。
  • IVF-PQ: inverted file + product quantization, lower memory, worse recall at high K.IVF-PQ:倒排 + 乘积量化,内存低,高 K 召回偏弱。
  • ScaNN (Google) and DiskANN (Microsoft) for disk-resident indices.ScaNN(Google)与 DiskANN(微软)用于磁盘索引。

Architecture架构

flowchart LR
  W[Writer] --> SHARD[Sharder]
  SHARD --> S1[Shard 1: HNSW]
  SHARD --> S2[Shard 2: HNSW]
  SHARD --> S3[Shard 3: HNSW]
  Q[Query] --> ROUTE[Router]
  ROUTE --> S1 & S2 & S3
  S1 & S2 & S3 --> MERGE[Top-K Merger]
  MERGE --> RET[Return]

Key decisions关键决策

  • Sharding: random or by tenant. Random = balance but query fans to all shards. By tenant = locality but hot-tenant risk.分片:随机或按租户。随机 = 均衡但 query 扇出;按租户 = 局部性但有热租户风险。
  • Metadata filter: pre-filter can crash recall; post-filter wastes compute. Hybrid "filter-then-index" per-shard is often best.Metadata 过滤:预过滤会破坏召回;后过滤浪费算力。每分片 hybrid「filter-then-index」通常最优。
  • Ingestion pipeline: batched writes; async index rebuild; immutable segments like LSM-tree.采集管道:批量写;异步重建索引;不可变段(类 LSM)。
  • Freshness: if real-time writes required, maintain a live search layer + periodic merge into cold HNSW index.新鲜度:若需实时写入,维护实时搜索层 + 周期合并到冷 HNSW 索引。

Related study-guide topics相关学习手册专题