OpenAI ★★ Frequent Medium Token BucketDistributed

O17 · Design a Rate Limiter O17 · 设计限流器

Verified source经核实出处

Prompt: "Design a Rate Limiter." — Jobright, IGotAnOffer. Classic with AI twist (per-token billing). Credibility C/D.

Algorithms compared算法对比

Algorithm	Properties	Use-case
Token bucket	Burst allowed up to bucket size, steady refill.允许突发到桶大小，稳定续 token。	Most APIs — OpenAI-style.多数 API，OpenAI 风格。
Leaky bucket	Strict smoothing; queue with fixed drain.严格平滑；定速流出队列。	Traffic shaping; streaming.流量整形；流式场景。
Fixed window	Simple; burst at window boundary.简单；窗口边界有突发。	Coarse quotas, daily limits.粗粒度配额、日限。
Sliding window log	Exact but memory-heavy.精确但内存开销大。	Small scale, accuracy-critical.小规模，精度关键。
Sliding window counter	Approximate; O(1) memory.近似；O(1) 内存。	Most APIs.多数 API。

Distributed coordination分布式协调

Redis + Lua for atomic token-bucket operations (check-and-decrement).Redis + Lua原子化令牌桶（先查后扣一步完成）。
Sticky hashing of key (user/API key) to specific shard → locality + consistency.按 key 黏附哈希（用户/API key）到特定 shard → 局部性 + 一致性。
Client-side check with server reconciliation: gateway pre-filters, Redis authoritatively decides.客户端预检 + 服务端权威：网关预过滤，Redis 最终拍板。

AI-specific twistAI 专属扩展

For LLM APIs, limit on tokens (input+output) not just requests. Requires predicting output tokens or reserving max then refunding unused.LLM API 的限流应基于 token（输入+输出）而非请求数。需预估输出 token 或先 reserve 最大再返还未用部分。

O17 · Design a Rate Limiter O17 · 设计限流器

Verified source经核实出处

Algorithms compared算法对比

Distributed coordination分布式协调

AI-specific twistAI 专属扩展

Related study-guide topics相关学习手册专题