① Canonical Books, Ranked ① 核心书单(排名)

Eight books covering distributed systems, ML systems, LLM systems, agents, and the interview process itself. Rank reflects leverage per hour of reading for OpenAI/Anthropic prep, not general quality. 涵盖分布式系统、ML 系统、LLM 系统、Agent 与面试流程的八本书。排名按「每小时阅读对 OpenAI/Anthropic 准备的杠杆」,不代表整体好坏。

#排名 Title / Author书名 / 作者 Why为何必读 Key chapters关键章节 Best for最适合
1 Designing Data-Intensive Applications
Martin Kleppmann · O'Reilly · 2017
The distributed-systems bible. Every deep-dive at either company expects DDIA-level mental models.分布式系统圣经。两家公司的深挖都默认你掌握 DDIA 级心智模型。 Ch.5 (Replication), Ch.7 (Transactions), Ch.9 (Consensus), Ch.11 (Streams) distributed systems
2 Designing Machine Learning Systems
Chip Huyen · O'Reilly · 2022
Canonical ML-system lifecycle reference. Terminology in most ML interviews comes directly from this book.ML 系统生命周期的权威参考。面试里多数术语直接出自本书。 Ch.7 (Deployment), Ch.8 (Drift), Ch.9 (Continual), Ch.10 (MLOps) ML systems
3 System Design Interview Vol. 1
Alex Xu · 2nd ed · 2020
The interview-ready templates. Best source of practice answers with clear diagrams.最接近面试的模板书,带清晰图示的练习答案首选。 Ch.3 (framework), Ch.4 (rate limiter), Ch.6 (KV store), Ch.11 (news feed) interview templates
4 System Design Interview Vol. 2
Alex Xu & Sahn Lam · 2022
Sequel to Vol. 1 covering real-time, streaming, payments, and storage templates missing from the original — proximity service, metrics/alerting, ad aggregation, S3-like object store.Vol.1 的续集,补齐实时、流式、支付与存储类模板——邻近服务、指标告警、广告聚合、S3 类对象存储。 Ch.4 (Message Queue), Ch.5 (Metrics/Alerting), Ch.6 (Ad Click Aggregation), Ch.9 (S3 Object Storage) streaming · storage
5 Agentic Design Patterns
Antonio Gulli · Springer · 2024
The canonical taxonomy of agent patterns — routing, reflection, MCP, A2A, guardrails. Directly relevant to Anthropic agent questions.Agent 模式的权威分类:routing、reflection、MCP、A2A、guardrails。Anthropic agent 题目直接命中。 Ch.5 (Tool Use), Ch.7 (Multi-Agent), Ch.14 (RAG), Ch.18 (Guardrails) agents · safety
6 Acing the System Design Interview
Zhiyong Tan · Manning · 2024
Best single source on the interview process itself — NFRs, reflection, self-assessment, functional partitioning.面试流程最强单本来源:NFR、反思、自评、功能拆分。 Ch.1-3 (framework+NFRs), Ch.4 (DB scaling), Ch.13 (CDN), Ch.16 (feed) interview process
7 Machine Learning Design Interview
Khang Pham · 2022
Case-by-case ML architectures from YouTube, Feed, Airbnb, LinkedIn. Complements Chip Huyen.YouTube、Feed、Airbnb、LinkedIn 的逐案 ML 架构;与 Chip Huyen 互补。 Ch.2 (primer), Ch.3 (YouTube), Ch.4 (feed), Ch.7 (Airbnb), Ch.8 (search) ML case studies
8 ByteByteGo Big Archive 2023
Alex Xu (compilation) · 2023
Visual cheat-sheet for breadth recall — latency numbers, load balancing, DB sharding, real-world tech stacks.宽度记忆的视觉速查:延迟数字、负载均衡、分片、公司真实技术栈。 Latency numbers; LB algorithms; DB sharding; Kafka deep dive; Netflix/Uber stacks延迟数字;LB 算法;分片;Kafka;Netflix/Uber 架构 visual recall

② Essential Blogs & Newsletters ② 必读博客与 Newsletter

Books give you structure; blogs give you freshness. These eight cover 90% of what a prepared candidate cites in 2026. 书提供结构,博客提供新鲜度。以下八个覆盖 2026 年一位合格候选人引用的 90%。

Chip HuyenChip Huyen

huyenchip.com — long-form essays on ML systems, LLM stack, and agent evals; author of DMLS.huyenchip.com——ML 系统、LLM 栈、Agent 评估的长文;DMLS 作者。

Eugene YanEugene Yan

eugeneyan.com — pragmatic ML patterns from Amazon; great on RAG, evaluation, and product-ML.eugeneyan.com——Amazon 视角的务实 ML 模式;RAG、评估、产品 ML 写得极好。

Hamel HusainHamel Husain

hamel.dev — hands-on LLM evals, fine-tuning field notes, and practical agent debugging.hamel.dev——动手的 LLM 评估、微调笔记、Agent 调试实战。

Anthropic EngineeringAnthropic 工程博客

anthropic.com/news — Constitutional AI, RSP, interpretability, tool use. Required reading before any Anthropic round.anthropic.com/news——CAI、RSP、可解释性、工具调用。Anthropic 面试前必读。

OpenAI EngineeringOpenAI 工程博客

openai.com/blog — model launches, system cards, Preparedness framework, API best practices.openai.com/blog——模型发布、system card、Preparedness 框架、API 最佳实践。

High ScalabilityHigh Scalability

highscalability.com — decade of "how X scaled" case studies across Netflix, Discord, Reddit, WhatsApp.highscalability.com——十年的「X 如何扩容」案例库,涵盖 Netflix、Discord、Reddit、WhatsApp。

AWS / GCP ArchitectureAWS / GCP 架构博客

aws.amazon.com/blogs/architecture & cloud.google.com/blog — reference architectures used in production at Fortune 500 customers.AWS 与 Google Cloud 架构博客——500 强客户生产级的参考架构。

The BatchThe Batch

deeplearning.ai/the-batch — Andrew Ng's weekly summary of ML / LLM news, with editorial commentary.deeplearning.ai/the-batch——吴恩达团队每周 ML/LLM 速报,带解读。

③ Courses ③ 课程

Pick one paid course; do not buy multiple. They overlap heavily and the returns on a second course are low. 付费课程挑一个即可,别重复买。内容高度重叠,第二门的边际收益很低。

④ Interview Practice Platforms ④ 面试练习平台

Cross-reference what interviewers at OpenAI and Anthropic are actually asking in the last 3 months. Signal-to-noise varies; use multiple. 交叉验证过去 3 个月 OpenAI 与 Anthropic 面试官实际在问什么。信噪比差异大,建议多平台对照。

⑤ Key Papers to Memorise ⑤ 必背论文

You do not need to have read each end-to-end — but you must know the core claim, the key numbers, and the one diagram each paper is famous for. Interviewers love "so what's the intuition behind X" questions. 不必一字不落地读完——但必须会讲每篇的核心主张、关键数字和标志性图。面试官很爱问「X 背后的直觉是什么」。

Paper论文 Year年份 Why it matters为何重要
Dynamo 2007 Eventually consistent KV store; consistent hashing + quorum + vector clocks. Alex Xu Ch.6 is a direct descendant.最终一致的 KV 存储;一致性哈希+quorum+vector clock。Alex Xu 第 6 章即其直系后裔。
Bigtable 2006 LSM-tree-based wide column store; direct ancestor of HBase, Cassandra, ScyllaDB.基于 LSM 的宽列存储;HBase、Cassandra、ScyllaDB 的直系祖先。
Kafka 2011 Durable, partitioned, replayable log. The default answer to "how do your services talk?"持久、分区、可重放日志。「你的服务之间怎么通信」的默认答案。
MapReduce 2004 Batch-processing abstraction that launched the entire big-data ecosystem.启动整个大数据生态的批处理抽象。
Raft 2014 Understandable consensus. Know leader election, log replication, and safety properties.可理解版的共识。leader 选举、日志复制、安全性属性要能讲。
vLLM / PagedAttention 2023 Kwon et al. — KV cache as paged memory; 2-4x throughput over HuggingFace. Foundational for LLM serving interviews.Kwon 等——KV cache 分页管理,相对 HF 吞吐 2-4x。LLM 推理面试必备。
FlashAttention 2022 Dao et al. — IO-aware attention kernel; linear memory instead of quadratic. Enables long-context training.Dao 等——IO 感知 attention kernel;线性内存代替二次方。使长上下文训练成为可能。
Speculative Decoding 2023 Leviathan et al. — draft-verify loop gives 2-3x decode speedup without quality loss.Leviathan 等——draft-verify 循环,质量无损下解码 2-3x 提速。
Constitutional AI 2022 Bai et al. — self-critique by written principles; foundation of Anthropic's alignment stack.Bai 等——按书面原则自我批判;Anthropic 对齐栈的地基。
GPT-3 2020 Brown et al. — in-context / few-shot learning at 175B scale. The paper that changed the field.Brown 等——175B 规模的 in-context/few-shot 学习。改写行业的论文。
GPT-4 Technical Report 2023 OpenAI — predictable scaling, system card structure, and the "model spec" approach.OpenAI——可预测的扩展、system card 结构、model spec 方法论。
Megatron-LM 2019 Shoeybi et al. — tensor parallelism recipe still used everywhere in 2026.Shoeybi 等——2026 年仍随处可见的 tensor 并行方案。
ZeRO / FSDP 2019 Rajbhandari et al. — optimiser / grad / param sharding; lets you fit 100B+ on commodity clusters.Rajbhandari 等——优化器/梯度/参数分片;百亿+ 也能在普通集群训练。
Chinchilla 2022 Hoffmann et al. — compute-optimal scaling: 20 tokens per parameter. Overturned GPT-3 intuitions.Hoffmann 等——算力最优扩展:每参数 20 token。推翻 GPT-3 直觉。
Scaling Laws 2020 Kaplan et al. — predictable loss vs compute/params/data; the planning tool for all foundation-model teams.Kaplan 等——loss 对算力/参数/数据的可预测关系;所有基础模型团队的规划工具。
LoRA 2021 Hu et al. — low-rank adapters; the default fine-tuning method for 2024-2026.Hu 等——低秩 adapter;2024-2026 默认微调方法。

⑥ GitHub Repos to Know ⑥ 必知 GitHub 仓库

Skim the READMEs and scan one core file each. Being able to say "I have actually looked at vLLM's scheduler" separates serious candidates from the rest. 至少扫一遍每个仓库的 README 并读一个核心文件。能说出「我真的看过 vLLM 的调度器」的候选人明显更有竞争力。

vLLMvLLM

PagedAttention + continuous batching reference implementation. Read vllm/engine/llm_engine.py.PagedAttention + continuous batching 参考实现。重点看 vllm/engine/llm_engine.py

DeepSpeedDeepSpeed

ZeRO / 3D-parallel training library. Study the ZeRO stage-3 docs.ZeRO / 3D 并行训练库。重点看 ZeRO stage-3 文档。

Megatron-LMMegatron-LM

NVIDIA's tensor-parallel reference for large LLMs; canonical TP splits.NVIDIA 大 LLM 的 tensor 并行参考;经典 TP 切分。

TensorRT-LLMTensorRT-LLM

Production inference runtime on NVIDIA GPUs; fused kernels, quantisation, in-flight batching.NVIDIA GPU 上的生产推理运行时;融合算子、量化、in-flight batching。

PyTorch FSDPPyTorch FSDP

Fully Sharded Data Parallel inside PyTorch core. See torch/distributed/fsdp/.PyTorch 核内置的 FSDP。重点看 torch/distributed/fsdp/

RayRay

Distributed task runtime powering many LLM training / serving stacks; Ray Serve for online.众多 LLM 训练/推理栈背后的分布式任务运行时;Ray Serve 用于在线。

llama.cppllama.cpp

CPU + quantised inference reference. Go-to when the interview turns to edge / on-device.CPU + 量化推理参考。边缘/端上话题的首选示例。

Anthropic Performance Take-homeAnthropic 性能 Take-home

github.com/anthropics/performance-takehome — their public perf-engineering take-home; a direct window into the bar.github.com/anthropics/performance-takehome——Anthropic 公开的性能工程 take-home;能直接窥见他们的标准。