O23 · Enterprise GPT for a 20K-Employee Company (RAG + ACL) O23 · 2 万人企业的 Enterprise GPT(RAG + ACL)
Verified source经核实出处
小红书 @Justin (2026) — candidate got Strong Hire with a 4-layer decomposition. Distinct from O16 (LLM-powered Enterprise Search) in four ways:
① explicitly 20K-employee scale, ② per-document ACL is a first-class requirement, ③ P95 < 2 s latency budget, ④ EN/中 multilingual, ⑤ continuous-learning loop from feedback. Credibility B/C (candidate-attributed, detailed).来自小红书 @Justin(2026)——候选人用四层架构拿到 Strong Hire。与 O16(LLM Enterprise Search)的区别在:① 明确 2 万人规模、② 文档级 ACL 是一等需求、③ P95 < 2 秒延迟预算、④ 中英双语、⑤ 反馈驱动的持续学习闭环。可信度 B/C(候选人自述,细节充分)。
Requirements (the exact interview spec)需求(面试原题)
- Accurate answer with traceable citations (to the exact paragraph / version).准确回答,带可追溯引用(到具体段落/版本)。
- ACL correctness: an employee sees only content they're authorised to see.权限正确:员工只能看到有权限的内容。
- P95 < 2 s (excluding model inference, which can stream).P95 < 2 秒(不含模型推理,推理可做 streaming)。
- Multilingual: EN + 中 at minimum.多语言:至少中英。
- Continuous learning: offline metrics improve, online complaints drop.持续学习:离线指标稳定提升,线上投诉率下降。
- Sources: Confluence, Google Docs, PDFs, Slack announcements, ticket KBs, PRDs.来源:Confluence、Google Docs、PDF、Slack 公告、工单知识库、PRD。
The 4-layer architecture (the Strong-Hire answer)四层架构(Strong Hire 答法)
flowchart TB
subgraph L1["①Ingestion & Chunking"]
CONN[Connectors
Confluence · GDocs · Slack · PDFs · Jira] --> NORM[Normalizer
dedupe, lang-detect, PII scrub]
NORM --> CHUNK[Chunker
semantic + overlap]
CHUNK --> EMB[Embedding
multilingual e5 / bge-m3]
EMB --> IDX[(Vector Index
+ keyword index)]
CHUNK --> META[(Doc Metadata
acl_ids, version, lang)]
end
subgraph L2["② Retriever"]
Q[Query] --> QR[Query rewrite
LLM lite + HyDE opt.]
QR --> HY[Hybrid search
BM25 + vector]
HY --> ACL[ACL filter
user_acl ∩ chunk_acl]
ACL --> RR[Reranker
cross-encoder]
end
subgraph L3["③ Evaluator"]
RR --> EV[Answerability & coverage check]
EV -->|insufficient| FB[Refuse / "I don't know"]
EV -->|ok| GEN
end
subgraph L4["④ Generator"]
GEN[Prompt assembly
system + ctx + citations] --> LLM[LLM inference
streaming]
LLM --> POST[Post-check:
citation binding + PII guard]
POST --> USR[User + feedback capture]
end
USR -.thumb/complaint.-> FB2[Feedback store] --> EVAL[Offline eval
factuality, citation@k]
EVAL --> L1
EVAL --> L2
Per-document ACL done right文档级 ACL 的正确做法
- Store ACL with the chunk, not only the doc. Each chunk inherits doc ACL but can be overridden by section-level rules (e.g., PRD appendix).ACL 存在 chunk 级,不只是文档级。每个 chunk 继承文档 ACL,可被段级规则覆盖(如 PRD 附录)。
- ACL as a first-class filter in retrieval. Post-filter leaks information (top-K might all be restricted → "I know nothing" reveals presence). Use pre-filter via a posting-list intersection at the vector/keyword index level.ACL 是检索中的一等过滤器。后过滤会泄露信息(top-K 全被禁 → "我什么都不知道"反而暴露存在)。应用 前过滤,在向量/关键词索引层做 posting-list 交集。
- Principal expansion: flatten user → groups → roles into a set of acl_ids at request time; cache per-user in Redis with short TTL.主体展开:请求时把 用户 → 组 → 角色 展开为 acl_ids 集合;Redis 短 TTL 缓存按用户。
- Never train on restricted content. Fine-tuning and eval sets filter by ACL; log hash-only to keep audit trail without leakage.绝不用受限内容训练。微调/评测集按 ACL 过滤;日志只存 hash 以审计而不泄露。
P95 < 2 s latency budget (retrieval stack)P95 < 2 秒 延迟预算(不含推理)
| Stage | Target | Technique |
|---|---|---|
| Query rewrite | < 150 ms | Small model; skip if query is unambiguous.小模型;无歧义直接跳过。 |
| Hybrid retrieval | < 250 ms | Parallel BM25 + vector; ACL pre-filter at index level.BM25 与向量并行;索引层 ACL 前过滤。 |
| Rerank | < 400 ms | Top-100 → top-8 cross-encoder; batch; cache top rerank results.top-100 → top-8 cross-encoder;批处理;缓存 rerank 结果。 |
| Evaluator + prompt build | < 200 ms | Simple heuristic + template fill.简单启发式 + 模板拼接。 |
| First token (model) | streamed | Out of budget by design → user sees streaming.按约定不计入预算 → 用户看到流式输出。 |
Multilingual (EN + 中)多语言(中英)
- Multilingual embeddings (bge-m3, e5-multilingual): same vector space, no translate-at-query.多语言 embedding(bge-m3、e5-multilingual):同向量空间,无需查询时翻译。
- Chunk-language tag → boost chunks in the user's language but don't exclude cross-language hits.Chunk 语言标签 → 轻加权用户语言,不排除跨语言命中。
- Answer language = user question language by default; cite in original.回答语言 = 用户提问语言(默认);引用保留原文。
Continuous learning loop持续学习闭环
- Capture
(query, retrieved, answer, thumb, complaint)tuples.捕获(query, retrieved, answer, thumb, complaint)四元组。 - Weekly offline eval: factuality, citation@k, refusal correctness, ACL-leak probes.每周离线评测:事实性、citation@k、拒答准确、ACL 泄漏探针。
- Hard-negative mining for retriever fine-tuning; prompt-template AB.Retriever 的难负样本挖掘;Prompt 模板 AB。
- Gate rollout on "complaint rate ↓ and factuality ↑ both significant".上线门槛:投诉率↓ 且 事实性↑ 双显著。
Follow-ups追问(原帖在 Generator 被深挖)
- Citation binding: force the model to emit
[doc_id:chunk_id]tokens; post-checker verifies every claim maps to cited chunks; drop uncited sentences or refuse.引用绑定:强制模型输出[doc_id:chunk_id]标记;后检器验证每条断言都映射到引用 chunk;未引用句删除或拒答。 - Fresh-write → instant availability? Dual-write to a "recent" index (small, cheap) while the batch indexer catches up; merge at query time.刚写入内容的即时可见?双写到"近期索引"(小、便宜),同时批量索引追进度;查询时合并。
- Eval that catches ACL leaks? Red-team probe set: known sensitive Qs issued as different personas; assert no restricted chunk appears in retrieval.如何保证 ACL 不被绕过?Red-team 探针集:敏感问题以不同 persona 提问;断言检索结果无受限 chunk。
- Cost guardrails? Per-employee monthly token budget; shrink context aggressively when near cap.成本护栏?按员工月度 token 预算;接近上限时激进缩上下文。