xAI ★ Emerging Medium RankingRecSysLLM

X4 · Design X For-You Re-ranking with Grok X4 · 设计用 Grok 重排序 X 的 For-You 信息流

Verified source经核实出处

X's ranking algo is open-sourced (github.com/twitter/the-algorithm). xAI has stated Grok will increasingly power ranking. Credibility B.

Problem问题

X has ~200M DAU and each user refreshes their For-You timeline many times per day. Candidate generation produces ~1500 posts per request; final ranking must score and return top 20. Design an LLM-in-the-loop re-ranker that adds Grok-based relevance without exceeding a 100ms latency budget.X 日活约 2 亿,每个用户每天多次刷新 For-You。候选生成约每次 1500 条,最终排序输出 Top 20。设计一个 LLM in-the-loop 重排器,在 100ms 延迟预算内加入 Grok 相关性信号。

Architecture架构

flowchart LR
  U[User opens X] --> CG[Candidate generator ~1500]
  CG --> L1[Lightweight tower model, top 200]
  L1 --> L2[Grok-mini embedding re-rank, top 50]
  L2 --> L3[Heavy Grok scorer, top 20]
  L3 --> TL[For-You timeline]

Key decisions关键决策

  • Three-stage funnel: classic model → small LLM embedding → heavy LLM. LLMs only run on shrinking candidate sets.三级漏斗:经典模型 → 小 LLM embedding → 重型 LLM。LLM 只作用于逐级缩小的候选集。
  • Precompute post embeddings at ingest time; re-rank is a dot product, not a forward pass.在摄入时预先计算帖子 embedding;重排是点积而非一次前向推理。
  • User intent vector updated from last 10 interactions — gives personalization without full profile scan.用户意图向量基于最近 10 次互动更新——在不扫描完整画像的情况下实现个性化。
  • Only the final top-20 heavy-scoring stage invokes full Grok; this is the tight latency loop.只有最后 Top 20 的重型打分阶段调用完整 Grok;这是严格的延迟热点。
  • Offline eval: A/B test against current heuristic stack measuring dwell time and report rate, not just CTR.离线评估:对比现有启发式栈做 A/B,关注停留时长和举报率,而不仅是点击率。

Follow-ups追问

  • What metrics besides CTR? (Dwell, like-after-read, reports, diversity, political balance.)除了 CTR 还看什么?(停留、阅后点赞、举报、多样性、政治平衡度。)
  • Cold-start: how do you rank for a user with 3 interactions?冷启动:只有 3 次互动的用户如何排序?

Related study-guide topics相关学习手册专题