A31 · Design a Rate Limiter for the Claude API A31 · 为 Claude API 设计限流器
Verified source经核实出处
Anthropic Claude API has documented tiered rate limits (docs). Reported at Anthropic onsites. Credibility A.
Dimensions维度
- **RPM** (requests per minute): coarse DoS guard.**RPM**(每分钟请求):粗粒度 DoS 防护。
- **ITPM / OTPM** input vs output tokens per minute: LLMs care about compute, not just request count.**ITPM / OTPM**:输入/输出 token 分别计——LLM 看重算力。
- **TPD** tokens per day: soft budget cap per tier.**TPD**:日 token 预算,按层限。
Architecture架构
flowchart LR API --> GW GW --> RL[Rate limit check - Redis Lua] RL -->|deny| R429[429] RL -->|allow| INF[Inference] INF --> METER[Emit usage event] METER --> RL
Key decisions关键决策
- **Token reservation**: request reserves max possible output tokens; unused refunded post-completion.**token 预留**:入口按最大输出预扣;完成后返还未用量。
- **Atomic Lua in Redis** for check-and-decrement across multiple buckets.**Redis Lua 原子**跨多桶 check-and-decrement。
- **Sticky shard by org_id** minimises cross-shard chatter; global reconciliation for TPD.**按 org 黏附 shard** 减少跨分片;TPD 做全局对账。
- **429 with Retry-After + ratelimit-remaining headers**: clients can self-throttle.**429 + Retry-After / remaining 头**:客户端自我节流。
Follow-ups追问
- Burst handling? token bucket with burst = 2x refill.突发?token bucket,burst = 2x 填充速率。
- Priority queue for enterprise? separate queue + weighted shared pool.企业优先?独立队列 + 加权共享池。