A17 · Review an Inference API Design for Scale A17 · 评审他人的推理 API 设计
Verified source经核实出处
Prompt: "You are reviewing another engineer's design doc…critique…SLOs…autoscaling…circuit breakers…canary…audit logs…cost controls." — PracHub, Onsite. Credibility B.
Standard answer structure (use as a checklist)标准答题结构(当清单用)
- Fill missing SLOs / capacity assumptions first.先补齐缺失的 SLO / 容量假设。
- Find single points & failure modes (GPU OOM, hot-swap failure, queue backlog, cross-AZ partition).找单点与故障模式(GPU OOM、热更失败、队列积压、跨 AZ 断链)。
- Prioritize changes: safety-first (rate limit / circuit breaker / rollback) → efficiency (batch / cache) → cost (SKU pool / valley-fill).改动优先级:保命(限流/熔断/回滚)→ 提效(batch/缓存)→ 降本(SKU 池/填谷)。
Failure-mode catalog故障模式清单
- Overload: admission control + token-level backpressure.过载:准入控制 + token 级背压。
- Bad release: canary + automated rollback on error-budget burn.坏发布:canary + 错误预算告警触发自动回滚。
- Noisy neighbor: per-tenant isolation, quota, priority queues.嘈杂邻居:按租户隔离、配额、优先级队列。
- Data corruption: immutable inputs, output signing, audit log.数据污染:不可变输入、输出签名、审计日志。
Tip技巧
Anchor each recommendation to an SLO change (e.g., "this canary policy cuts blast radius from 100% → 1% for bad releases"). Raw advice without SLO framing loses points.每条建议都锚定到 SLO 变化(如「这套 canary 把坏发布的 blast radius 从 100% 降到 1%」)。不绑 SLO 的建议扣分。