Anthropic ★★★ Frequent Medium Prompt CacheKV CacheCost

A38 · Design Claude's Prompt Caching Service A38 · 设计 Claude 的 Prompt 缓存服务

Verified source经核实出处

Anthropic launched prompt caching 2024 (docs). Credibility A.

Architecture架构

flowchart LR
  Req --> P[Prefix parser w/ cache_control]
  P --> H[Hash prefix]
  H --> ROUTE[Consistent hash to GPU]
  ROUTE --> GPU[GPU - KV cache pool]

Key decisions关键决策

  • **Explicit markers**: user annotates which parts to cache; avoids misleading auto-dedup.**显式标记**:用户标注可缓存部分;避免误判自动去重。
  • **Prefix-aligned**: cache covers a continuous leading segment; downstream differences don't invalidate.**前缀对齐**:缓存覆盖连续前缀;后段差异不致失效。
  • **5-min TTL** matches typical session turn; avoid hoarding GPU memory for stale prompts.**5 分钟 TTL**贴合会话节奏;避免为陈旧 prompt 占用 GPU 内存。
  • **Sticky routing** identical to O31 (OpenAI prompt cache).**粘性路由**与 O31 相同。

Follow-ups追问

  • Granularity? minimum 1024 tokens; shorter prefixes not cached.粒度?最小 1024 token;更短不缓存。
  • User changes cached prefix slightly? cache miss; re-warm.前缀微改?未命中;重新 warm。

Related study-guide topics相关学习手册专题