OpenAI ★★ Frequent Hard ModerationSafetyClassifier

O32 · Design a Content Moderation Pipeline O32 · 设计内容审核流水线

Verified source经核实出处

OpenAI Moderation API (docs). Interview reports confirm design questions. Credibility A.

Architecture架构

flowchart LR
  Req --> IN[Input Moderation]
  IN -->|block| Reject
  IN -->|allow| LLM
  LLM --> OUT[Output Moderation]
  OUT -->|block| Scrub
  OUT -->|flag| Q[Review Queue]
  OUT --> Deliver
  Q --> Human
  Human --> LBL[(Labels DB)]
  LBL --> Trainer

Key decisions关键决策

**Two-sided moderation**: both prompts and completions scored, jailbreaks can craft benign prompts that elicit harmful outputs.**双侧审核**：prompt 与 completion 都要评分——越狱可能用无害 prompt 引出有害输出。
**Classifier ensemble** (fast small model -> deep model on edge cases).**分类器集成**：轻量初筛 + 重型复核边缘样本。
**Policy categories as structured schema**, not binary; per-category thresholds.**策略类别即结构化 schema**，非二分；每类独立阈值。
**Shadow eval in prod**: 1% traffic scored by canary classifier.**线上 shadow eval**：1% 流量由 canary 评分。

Follow-ups追问

Multilingual? per-language thresholds + translation pivot classifier.多语言？按语言阈值 + 英文枢纽分类器。
Latency budget? input ≤ 30 ms p99; output runs concurrently with generation.延迟？入站 ≤ 30 ms p99；出站与生成并行。

O32 · Design a Content Moderation Pipeline O32 · 设计内容审核流水线

Verified source经核实出处

Architecture架构

Key decisions关键决策

Follow-ups追问

Related study-guide topics相关学习手册专题