O17 · Design a Rate Limiter O17 · 设计限流器
Verified source经核实出处
Prompt: "Design a Rate Limiter." — Jobright, IGotAnOffer. Classic with AI twist (per-token billing). Credibility C/D.
Algorithms compared算法对比
| Algorithm | Properties | Use-case |
|---|---|---|
| Token bucket | Burst allowed up to bucket size, steady refill.允许突发到桶大小,稳定续 token。 | Most APIs — OpenAI-style.多数 API,OpenAI 风格。 |
| Leaky bucket | Strict smoothing; queue with fixed drain.严格平滑;定速流出队列。 | Traffic shaping; streaming.流量整形;流式场景。 |
| Fixed window | Simple; burst at window boundary.简单;窗口边界有突发。 | Coarse quotas, daily limits.粗粒度配额、日限。 |
| Sliding window log | Exact but memory-heavy.精确但内存开销大。 | Small scale, accuracy-critical.小规模,精度关键。 |
| Sliding window counter | Approximate; O(1) memory.近似;O(1) 内存。 | Most APIs.多数 API。 |
Distributed coordination分布式协调
- Redis + Lua for atomic token-bucket operations (check-and-decrement).Redis + Lua原子化令牌桶(先查后扣一步完成)。
- Sticky hashing of key (user/API key) to specific shard → locality + consistency.按 key 黏附哈希(用户/API key)到特定 shard → 局部性 + 一致性。
- Client-side check with server reconciliation: gateway pre-filters, Redis authoritatively decides.客户端预检 + 服务端权威:网关预过滤,Redis 最终拍板。
AI-specific twistAI 专属扩展
For LLM APIs, limit on tokens (input+output) not just requests. Requires predicting output tokens or reserving max then refunding unused.LLM API 的限流应基于 token(输入+输出)而非请求数。需预估输出 token 或先 reserve 最大再返还未用部分。