O35 · Design Usage Metering & Billing for LLM API O35 · 设计 LLM API 的用量计量与计费
Verified source经核实出处
Cross-referenced in multiple onsite reports (Blind, 1Point3Acres). Credibility B.
Architecture架构
flowchart LR API --> EV[Usage Event - request_id, tokens] EV --> STREAM[(Durable log - Kafka)] STREAM --> AGG[Streaming Aggregator] AGG --> LIVE[(Hot counters - Redis)] AGG --> WH[(Warehouse)] LIVE --> RL[Rate limiter] WH --> BILL[Billing reconciliation] BILL --> INV[Invoice service]
Key decisions关键决策
- **Dual-ledger**: fast Redis counters for rate-limit enforcement + durable Kafka log for audit billing.**双账本**:Redis 热计数用于限流 + Kafka 持久日志用于审计计费。
- **Idempotent events keyed on request_id**; reconciliation job rebuilds ledger.**事件以 request_id 幂等**;对账任务从日志重建账本。
- **Streaming responses**: meter emits incremental tokens; final event has final count.**流式响应**:边生成边报增量 token;最后一个事件带最终计数。
- **Time-bucket reconciliation**: hourly snapshots detect drift.**时间分桶对账**:小时快照检测偏差。
Follow-ups追问
- Retries? dedup by request_id in aggregator.重试?聚合器按 request_id 去重。
- Multi-region? region-local counters + daily cross-region merge.多区?区域本地计数 + 日级跨区合并。