OpenAI ★★ Frequent Medium Token BucketDistributed

O17 · Design a Rate Limiter O17 · 设计限流器

Verified source经核实出处

Prompt: "Design a Rate Limiter." — Jobright, IGotAnOffer. Classic with AI twist (per-token billing). Credibility C/D.

Algorithms compared算法对比

AlgorithmPropertiesUse-case
Token bucketBurst allowed up to bucket size, steady refill.允许突发到桶大小,稳定续 token。Most APIs — OpenAI-style.多数 API,OpenAI 风格。
Leaky bucketStrict smoothing; queue with fixed drain.严格平滑;定速流出队列。Traffic shaping; streaming.流量整形;流式场景。
Fixed windowSimple; burst at window boundary.简单;窗口边界有突发。Coarse quotas, daily limits.粗粒度配额、日限。
Sliding window logExact but memory-heavy.精确但内存开销大。Small scale, accuracy-critical.小规模,精度关键。
Sliding window counterApproximate; O(1) memory.近似;O(1) 内存。Most APIs.多数 API。

Distributed coordination分布式协调

  • Redis + Lua for atomic token-bucket operations (check-and-decrement).Redis + Lua原子化令牌桶(先查后扣一步完成)。
  • Sticky hashing of key (user/API key) to specific shard → locality + consistency.按 key 黏附哈希(用户/API key)到特定 shard → 局部性 + 一致性。
  • Client-side check with server reconciliation: gateway pre-filters, Redis authoritatively decides.客户端预检 + 服务端权威:网关预过滤,Redis 最终拍板。

AI-specific twistAI 专属扩展

For LLM APIs, limit on tokens (input+output) not just requests. Requires predicting output tokens or reserving max then refunding unused.LLM API 的限流应基于 token(输入+输出)而非请求数。需预估输出 token 或先 reserve 最大再返还未用部分。

Related study-guide topics相关学习手册专题