O32 · Design a Content Moderation Pipeline O32 · 设计内容审核流水线
Verified source经核实出处
OpenAI Moderation API (docs). Interview reports confirm design questions. Credibility A.
Architecture架构
flowchart LR Req --> IN[Input Moderation] IN -->|block| Reject IN -->|allow| LLM LLM --> OUT[Output Moderation] OUT -->|block| Scrub OUT -->|flag| Q[Review Queue] OUT --> Deliver Q --> Human Human --> LBL[(Labels DB)] LBL --> Trainer
Key decisions关键决策
- **Two-sided moderation**: both prompts and completions scored, jailbreaks can craft benign prompts that elicit harmful outputs.**双侧审核**:prompt 与 completion 都要评分——越狱可能用无害 prompt 引出有害输出。
- **Classifier ensemble** (fast small model -> deep model on edge cases).**分类器集成**:轻量初筛 + 重型复核边缘样本。
- **Policy categories as structured schema**, not binary; per-category thresholds.**策略类别即结构化 schema**,非二分;每类独立阈值。
- **Shadow eval in prod**: 1% traffic scored by canary classifier.**线上 shadow eval**:1% 流量由 canary 评分。
Follow-ups追问
- Multilingual? per-language thresholds + translation pivot classifier.多语言?按语言阈值 + 英文枢纽分类器。
- Latency budget? input ≤ 30 ms p99; output runs concurrently with generation.延迟?入站 ≤ 30 ms p99;出站与生成并行。