Anthropic ★★★ Frequent Hard Feature StoreDrift

A16 · Low-Latency ML Inference API A16 · 低延迟 ML 推理 API

Verified source经核实出处

Prompt: "Design a low-latency ML inference API…SLOs…feature retrieval…canary/rollbacks…drift detection" — PracHub, Onsite. Credibility B.

Three numbers you must have必须给出的三个数字

  • p95 latency (e.g. 50–150ms depending on business).p95 延迟(如 50–150ms,按业务)。
  • Availability SLO (99.9 / 99.99).可用性 SLO(99.9 / 99.99)。
  • QPS / throughput for capacity math.QPS/吞吐用于容量规划。

Architecture架构

flowchart LR
  U[Product Service] --> API[Inference API]
  API --> FS[Feature Store]
  API --> MS[Model Server]
  MS --> API
  API --> MON[Metrics + Drift]

Feature store realismFeature store 的现实

  • Online store must be low-latency. Training-serving skew is the #1 bug.在线 store 必须低延迟。Training-serving skew 是头号 bug。
  • Cache hot features with TTL + version.热特征缓存 + TTL + 版本。

Fallback (don't crash on partial failure)降级(部分故障时不崩)

  • Feature store slow → return default features / smaller model.Feature store 慢 → 返回默认特征 / 小模型。
  • GPU pool starved → switch to CPU / smaller model.GPU 池不足 → 切 CPU / 小模型。
  • Model anomaly → rollback to previous (always keep prior warm).模型异常 → 回滚上一个(上一版始终保持预热)。

Related study-guide topics相关学习手册专题