OpenAI ★★★ Frequent Hard RAGACLCitations

O23 · Enterprise GPT for a 20K-Employee Company (RAG + ACL) O23 · 2 万人企业的 Enterprise GPT(RAG + ACL)

Verified source经核实出处

小红书 @Justin (2026) — candidate got Strong Hire with a 4-layer decomposition. Distinct from O16 (LLM-powered Enterprise Search) in four ways:
① explicitly 20K-employee scale, ② per-document ACL is a first-class requirement, ③ P95 < 2 s latency budget, ④ EN/中 multilingual, ⑤ continuous-learning loop from feedback. Credibility B/C (candidate-attributed, detailed).
来自小红书 @Justin(2026)——候选人用四层架构拿到 Strong Hire。与 O16(LLM Enterprise Search)的区别在:① 明确 2 万人规模、② 文档级 ACL 是一等需求、③ P95 < 2 秒延迟预算、④ 中英双语、⑤ 反馈驱动的持续学习闭环。可信度 B/C(候选人自述,细节充分)。

Requirements (the exact interview spec)需求(面试原题)

  1. Accurate answer with traceable citations (to the exact paragraph / version).准确回答,带可追溯引用(到具体段落/版本)。
  2. ACL correctness: an employee sees only content they're authorised to see.权限正确:员工只能看到有权限的内容。
  3. P95 < 2 s (excluding model inference, which can stream).P95 < 2 秒(不含模型推理,推理可做 streaming)。
  4. Multilingual: EN + 中 at minimum.多语言:至少中英。
  5. Continuous learning: offline metrics improve, online complaints drop.持续学习:离线指标稳定提升,线上投诉率下降。
  6. Sources: Confluence, Google Docs, PDFs, Slack announcements, ticket KBs, PRDs.来源:Confluence、Google Docs、PDF、Slack 公告、工单知识库、PRD。

The 4-layer architecture (the Strong-Hire answer)四层架构(Strong Hire 答法)

flowchart TB
  subgraph L1["①Ingestion & Chunking"]
    CONN[Connectors
Confluence · GDocs · Slack · PDFs · Jira] --> NORM[Normalizer
dedupe, lang-detect, PII scrub] NORM --> CHUNK[Chunker
semantic + overlap] CHUNK --> EMB[Embedding
multilingual e5 / bge-m3] EMB --> IDX[(Vector Index
+ keyword index)] CHUNK --> META[(Doc Metadata
acl_ids, version, lang)] end subgraph L2["② Retriever"] Q[Query] --> QR[Query rewrite
LLM lite + HyDE opt.] QR --> HY[Hybrid search
BM25 + vector] HY --> ACL[ACL filter
user_acl ∩ chunk_acl] ACL --> RR[Reranker
cross-encoder] end subgraph L3["③ Evaluator"] RR --> EV[Answerability & coverage check] EV -->|insufficient| FB[Refuse / "I don't know"] EV -->|ok| GEN end subgraph L4["④ Generator"] GEN[Prompt assembly
system + ctx + citations] --> LLM[LLM inference
streaming] LLM --> POST[Post-check:
citation binding + PII guard] POST --> USR[User + feedback capture] end USR -.thumb/complaint.-> FB2[Feedback store] --> EVAL[Offline eval
factuality, citation@k] EVAL --> L1 EVAL --> L2

Per-document ACL done right文档级 ACL 的正确做法

  • Store ACL with the chunk, not only the doc. Each chunk inherits doc ACL but can be overridden by section-level rules (e.g., PRD appendix).ACL 存在 chunk 级,不只是文档级。每个 chunk 继承文档 ACL,可被段级规则覆盖(如 PRD 附录)。
  • ACL as a first-class filter in retrieval. Post-filter leaks information (top-K might all be restricted → "I know nothing" reveals presence). Use pre-filter via a posting-list intersection at the vector/keyword index level.ACL 是检索中的一等过滤器。后过滤会泄露信息(top-K 全被禁 → "我什么都不知道"反而暴露存在)。应用 前过滤,在向量/关键词索引层做 posting-list 交集。
  • Principal expansion: flatten user → groups → roles into a set of acl_ids at request time; cache per-user in Redis with short TTL.主体展开:请求时把 用户 → 组 → 角色 展开为 acl_ids 集合;Redis 短 TTL 缓存按用户。
  • Never train on restricted content. Fine-tuning and eval sets filter by ACL; log hash-only to keep audit trail without leakage.绝不用受限内容训练。微调/评测集按 ACL 过滤;日志只存 hash 以审计而不泄露。

P95 < 2 s latency budget (retrieval stack)P95 < 2 秒 延迟预算(不含推理)

StageTargetTechnique
Query rewrite< 150 msSmall model; skip if query is unambiguous.小模型;无歧义直接跳过。
Hybrid retrieval< 250 msParallel BM25 + vector; ACL pre-filter at index level.BM25 与向量并行;索引层 ACL 前过滤。
Rerank< 400 msTop-100 → top-8 cross-encoder; batch; cache top rerank results.top-100 → top-8 cross-encoder;批处理;缓存 rerank 结果。
Evaluator + prompt build< 200 msSimple heuristic + template fill.简单启发式 + 模板拼接。
First token (model)streamedOut of budget by design → user sees streaming.按约定不计入预算 → 用户看到流式输出。

Multilingual (EN + 中)多语言(中英)

  • Multilingual embeddings (bge-m3, e5-multilingual): same vector space, no translate-at-query.多语言 embedding(bge-m3、e5-multilingual):同向量空间,无需查询时翻译。
  • Chunk-language tag → boost chunks in the user's language but don't exclude cross-language hits.Chunk 语言标签 → 轻加权用户语言,不排除跨语言命中。
  • Answer language = user question language by default; cite in original.回答语言 = 用户提问语言(默认);引用保留原文。

Continuous learning loop持续学习闭环

  1. Capture (query, retrieved, answer, thumb, complaint) tuples.捕获 (query, retrieved, answer, thumb, complaint) 四元组。
  2. Weekly offline eval: factuality, citation@k, refusal correctness, ACL-leak probes.每周离线评测:事实性、citation@k、拒答准确、ACL 泄漏探针。
  3. Hard-negative mining for retriever fine-tuning; prompt-template AB.Retriever 的难负样本挖掘;Prompt 模板 AB。
  4. Gate rollout on "complaint rate ↓ and factuality ↑ both significant".上线门槛:投诉率↓ 且 事实性↑ 双显著。

Follow-ups追问(原帖在 Generator 被深挖)

  • Citation binding: force the model to emit [doc_id:chunk_id] tokens; post-checker verifies every claim maps to cited chunks; drop uncited sentences or refuse.引用绑定:强制模型输出 [doc_id:chunk_id] 标记;后检器验证每条断言都映射到引用 chunk;未引用句删除或拒答。
  • Fresh-write → instant availability? Dual-write to a "recent" index (small, cheap) while the batch indexer catches up; merge at query time.刚写入内容的即时可见?双写到"近期索引"(小、便宜),同时批量索引追进度;查询时合并。
  • Eval that catches ACL leaks? Red-team probe set: known sensitive Qs issued as different personas; assert no restricted chunk appears in retrieval.如何保证 ACL 不被绕过?Red-team 探针集:敏感问题以不同 persona 提问;断言检索结果无受限 chunk。
  • Cost guardrails? Per-employee monthly token budget; shrink context aggressively when near cap.成本护栏?按员工月度 token 预算;接近上限时激进缩上下文。

Related study-guide topics相关学习手册专题