Anthropic ★★★ Frequent Medium Rate LimitTokensTiers

A31 · Design a Rate Limiter for the Claude API A31 · 为 Claude API 设计限流器

Verified source经核实出处

Anthropic Claude API has documented tiered rate limits (docs). Reported at Anthropic onsites. Credibility A.

Dimensions维度

  • **RPM** (requests per minute): coarse DoS guard.**RPM**(每分钟请求):粗粒度 DoS 防护。
  • **ITPM / OTPM** input vs output tokens per minute: LLMs care about compute, not just request count.**ITPM / OTPM**:输入/输出 token 分别计——LLM 看重算力。
  • **TPD** tokens per day: soft budget cap per tier.**TPD**:日 token 预算,按层限。

Architecture架构

flowchart LR
  API --> GW
  GW --> RL[Rate limit check - Redis Lua]
  RL -->|deny| R429[429]
  RL -->|allow| INF[Inference]
  INF --> METER[Emit usage event]
  METER --> RL

Key decisions关键决策

  • **Token reservation**: request reserves max possible output tokens; unused refunded post-completion.**token 预留**:入口按最大输出预扣;完成后返还未用量。
  • **Atomic Lua in Redis** for check-and-decrement across multiple buckets.**Redis Lua 原子**跨多桶 check-and-decrement。
  • **Sticky shard by org_id** minimises cross-shard chatter; global reconciliation for TPD.**按 org 黏附 shard** 减少跨分片;TPD 做全局对账。
  • **429 with Retry-After + ratelimit-remaining headers**: clients can self-throttle.**429 + Retry-After / remaining 头**:客户端自我节流。

Follow-ups追问

  • Burst handling? token bucket with burst = 2x refill.突发?token bucket,burst = 2x 填充速率。
  • Priority queue for enterprise? separate queue + weighted shared pool.企业优先?独立队列 + 加权共享池。

Related study-guide topics相关学习手册专题