X4 · Design X For-You Re-ranking with Grok X4 · 设计用 Grok 重排序 X 的 For-You 信息流
Verified source经核实出处
X's ranking algo is open-sourced (github.com/twitter/the-algorithm). xAI has stated Grok will increasingly power ranking. Credibility B.
Problem问题
X has ~200M DAU and each user refreshes their For-You timeline many times per day. Candidate generation produces ~1500 posts per request; final ranking must score and return top 20. Design an LLM-in-the-loop re-ranker that adds Grok-based relevance without exceeding a 100ms latency budget.X 日活约 2 亿,每个用户每天多次刷新 For-You。候选生成约每次 1500 条,最终排序输出 Top 20。设计一个 LLM in-the-loop 重排器,在 100ms 延迟预算内加入 Grok 相关性信号。
Architecture架构
flowchart LR U[User opens X] --> CG[Candidate generator ~1500] CG --> L1[Lightweight tower model, top 200] L1 --> L2[Grok-mini embedding re-rank, top 50] L2 --> L3[Heavy Grok scorer, top 20] L3 --> TL[For-You timeline]
Key decisions关键决策
- Three-stage funnel: classic model → small LLM embedding → heavy LLM. LLMs only run on shrinking candidate sets.三级漏斗:经典模型 → 小 LLM embedding → 重型 LLM。LLM 只作用于逐级缩小的候选集。
- Precompute post embeddings at ingest time; re-rank is a dot product, not a forward pass.在摄入时预先计算帖子 embedding;重排是点积而非一次前向推理。
- User intent vector updated from last 10 interactions — gives personalization without full profile scan.用户意图向量基于最近 10 次互动更新——在不扫描完整画像的情况下实现个性化。
- Only the final top-20 heavy-scoring stage invokes full Grok; this is the tight latency loop.只有最后 Top 20 的重型打分阶段调用完整 Grok;这是严格的延迟热点。
- Offline eval: A/B test against current heuristic stack measuring dwell time and report rate, not just CTR.离线评估:对比现有启发式栈做 A/B,关注停留时长和举报率,而不仅是点击率。
Follow-ups追问
- What metrics besides CTR? (Dwell, like-after-read, reports, diversity, political balance.)除了 CTR 还看什么?(停留、阅后点赞、举报、多样性、政治平衡度。)
- Cold-start: how do you rank for a user with 3 interactions?冷启动:只有 3 次互动的用户如何排序?