Anthropic ★★★ Frequent Medium Prompt CacheKV CacheCost

A38 · Design Claude's Prompt Caching Service A38 · 设计 Claude 的 Prompt 缓存服务

Verified source经核实出处

Anthropic launched prompt caching 2024 (docs). Credibility A.

Architecture架构

flowchart LR
  Req --> P[Prefix parser w/ cache_control]
  P --> H[Hash prefix]
  H --> ROUTE[Consistent hash to GPU]
  ROUTE --> GPU[GPU - KV cache pool]

Key decisions关键决策

**Explicit markers**: user annotates which parts to cache; avoids misleading auto-dedup.**显式标记**：用户标注可缓存部分；避免误判自动去重。
**Prefix-aligned**: cache covers a continuous leading segment; downstream differences don't invalidate.**前缀对齐**：缓存覆盖连续前缀；后段差异不致失效。
**5-min TTL** matches typical session turn; avoid hoarding GPU memory for stale prompts.**5 分钟 TTL**贴合会话节奏；避免为陈旧 prompt 占用 GPU 内存。
**Sticky routing** identical to O31 (OpenAI prompt cache).**粘性路由**与 O31 相同。

Follow-ups追问

Granularity? minimum 1024 tokens; shorter prefixes not cached.粒度？最小 1024 token；更短不缓存。
User changes cached prefix slightly? cache miss; re-warm.前缀微改？未命中；重新 warm。

A38 · Design Claude's Prompt Caching Service A38 · 设计 Claude 的 Prompt 缓存服务

Verified source经核实出处

Architecture架构

Key decisions关键决策

Follow-ups追问

Related study-guide topics相关学习手册专题