How to use this guide如何使用本手册

Each topic page is self-contained (1,200–2,500 words) with: principles, trade-offs, concrete numbers, diagrams, and links back to the Arena questions that exercise it. Reading order doesn't matter — start from whatever your weakest area is. Every page cites the original book and chapter so you can dive deeper when needed. 每个主题页自成体系(1,200–2,500 字),包含:原理、权衡、具体数字、图示,以及回链到会考察它的真题 Arena。阅读顺序不限——从最弱的一项切入即可。每页都标注了原书章节,方便深入挖掘。

① Foundations ① 基础功

Before any domain knowledge, you need a repeatable interview framework and the ability to do rapid numeric estimation on a whiteboard. These two skills separate candidates who "know system design" from those who can actually do it under time pressure. 在进入领域知识之前,你需要一个可复用的面试框架与白板上快速估算的能力。这两项技能把「懂系统设计」的人与能在时间压力下真正做出来的人分开。

② Distributed Systems Core ② 分布式系统核心

These six topics underpin every "Design X" question. They're also where interviewers probe most aggressively for depth — being able to cleanly articulate CAP, linearizability vs. serializability, quorum math, and LSM vs. B-tree trade-offs signals senior-level thinking. 这六个主题支撑所有「设计 X」题目,也是面试官最爱深挖之处——把 CAP、可线性化 vs 可串行化、quorum 数学、LSM vs B-tree 说清楚就是资深信号。

③ Classical System Designs ③ 经典系统设计

Even AI-first companies ask these. OpenAI's Slack, Anthropic's chat service, webhook platforms — the fundamentals compound with the AI layer. 即便是 AI-first 公司也会问这些。OpenAI 的 Slack、Anthropic 的 chat 服务、Webhook 平台——基础积木与 AI 层层叠加。

④ LLM Systems (the core differentiator) ④ LLM 系统(最大差异化)

This is what OpenAI and Anthropic care about most. If you can only go deep on one section of this guide, make it this one. Expect questions about serving internals (KV cache, continuous batching, speculative decoding), RAG, agents, evals, and distributed training. 这是 OpenAI 与 Anthropic 最看重的一块。若只能深入学一节,就选这里。请准备:推理内部(KV cache、continuous batching、speculative decoding)、RAG、Agent、评估与分布式训练。

⑤ ML System Design ⑤ ML 系统设计

Classical ML is still part of the loop — especially ranking, recommendation, and moderation. Chip Huyen's Designing ML Systems is the backbone of this section. 经典 ML 依然重要——尤其是排序、推荐与审核。本节以 Chip Huyen《Designing ML Systems》为主干。

⑥ Safety & Alignment Engineering ⑥ 安全与对齐工程

Unique to Anthropic and increasingly to OpenAI — expect at least one question here. Constitutional AI, jailbreak defence, red-teaming infrastructure, and content moderation pipelines. Anthropic 独有,OpenAI 也越来越重视——至少一题会出现在这里。宪法式 AI、越狱防御、红队基建、内容审核流水线。