OpenAI ★★ Frequent Hard RAGHybrid RetrievalRerank

O8 · Search/Recommendation System with LLMs O8 · 融合 LLM 的搜索/推荐系统

Verified source经核实出处

Recruiter prompt: "We'll explore your experience with search, ranking, retrieval, and how to adapt LLMs to interact with such systems." — TeamBlind, 2025-10-22. Credibility C.

Split into two chains拆成两条主链路

  1. Online retrieval + ranking: Query → recall (keyword + vector) → rerank → result page.在线检索与排序:Query → 召回(keyword + 向量)→ rerank → 结果页。
  2. LLM insertion points: query understanding/rewrite, result summary/conversational explanation, tool-use to trigger secondary retrieval.LLM 插入点:Query 理解/改写、结果摘要/对话式解释、工具调用触发二次检索。

Reference architecture参考架构

flowchart LR
  U[User] --> QP[Query Parser]
  QP --> KR[Keyword Search (BM25)]
  QP --> VR[Vector Search (ANN)]
  KR --> M[Merge]
  VR --> M
  M --> RR[Reranker (Cross-Encoder)]
  RR --> LLM[LLM Layer]
  LLM --> U

Key trade-offs核心权衡

  • LLM in recall (query rewrite): cheaper downstream; risk of recall bias.LLM 参与召回(Query 改写):后续成本低;但有召回偏差风险。
  • LLM in rerank / summarize: safer but more expensive per-query.LLM 参与 rerank / 摘要:更稳但每次成本更高。
  • Index update: real-time writes sacrifice throughput; batch sacrifices freshness.索引更新:实时写牺牲吞吐;批处理牺牲新鲜度。

Evaluation (dual)评估(双轨)

  • IR metrics: Recall@K, NDCG@10, MRR.IR 指标:Recall@K、NDCG@10、MRR。
  • Generation quality: faithfulness, helpfulness, LLM-as-judge + human pairwise.生成质量:faithfulness、helpfulness、LLM-as-judge + 人类 pairwise。
  • Online: A/B with click, dwell time, satisfaction proxy.在线:A/B 测试点击、停留时间、满意度代理指标。

Safety lensSafety 视角

If safety comes up, reference Anthropic's Constitutional AI as a structured way to layer policy into the LLM step. A moderation pipeline on both the query and the answer is standard.若被问到安全,可引用 Anthropic 的 Constitutional AI 作为「把策略注入 LLM 步骤」的结构化方法。对 Query 与 Answer 都做 moderation 是标配。

Related study-guide topics相关学习手册专题