① Why this site exists ① 为什么会有这个站

System design at OpenAI and Anthropic is meaningfully different from the FAANG archetype. At OpenAI, the coding loop has become brutally hard — long multi-part algorithmic problems on top of an LLM-infra system round that expects you to talk about KV cache, continuous batching, and GPU economics in the same breath as Redis and Kafka. At Anthropic, the ML / LLM-infra depth is not optional, and there is an explicit safety gate: you can design the most elegant system in the world and still fail the loop for never mentioning misuse, jailbreaks, red-teaming, or evaluation. These are not the interviews you pass by grinding 20 Alex Xu drills. OpenAI 与 Anthropic 的系统设计面试与 FAANG 那套模板差异显著。OpenAI 的 coding 环节极其硬核——长篇多段的算法题叠加一轮 LLM 基础设施系统设计,面试官会希望你在谈 Redis、Kafka 的同时,自然带出 KV cache、continuous batching 与 GPU 成本结构。Anthropic 这边,ML / LLM 基建的深度不是可选项,并且有一道明确的安全关卡:你可以把系统画得再漂亮,只要全程不提 misuse、越狱、红队、评估,这一轮依然会被判不过。这不是刷 20 道 Alex Xu 就能通过的面试。

When I started preparing, I found no single source that did what I needed. The Chinese note-sharing sites leaned on secondhand summaries. English bootcamp content stopped at FAANG-era "design Twitter". Books went deep but were scattered — DDIA for distributed systems, Chip Huyen for ML platform, Gulli and Huyen again for agentic patterns, Alex Xu volumes for the framework, ByteByteGo for the visual shorthand, Acing the System Design Interview for the interview-mechanics layer, and the newer Machine Learning System Design Interview for end-to-end ML product rounds. And nobody combined that reading with a question bank that was actually labelled with credibility — who said they were asked this, where, and when. 开始备考时我发现,没有任何一个单点能覆盖我需要的东西。中文笔记站大多依赖二手总结,英文 bootcamp 内容还停留在 FAANG 时代的「设计 Twitter」,而书籍深度够却零散——DDIA 讲分布式、Chip Huyen 讲 ML 平台、Gulli 与 Huyen 又共同覆盖 agentic patterns、Alex Xu 两卷给出框架、ByteByteGo 提供视觉速记、《Acing the System Design Interview》处理面试机制、新出的《Machine Learning System Design Interview》补上端到端 ML 题型。更没有人把这些阅读材料与一份带「可信度标注」的题库拼在一起——谁在什么时间、什么场合说自己被问过这道题。

This site is the thing I wished existed: a verified-provenance arena of real OpenAI and Anthropic questions (2023–2026), topic pages that synthesise the eight books into interview-executable answers, and a study plan that sequences the whole mess into eight weeks. It is opinionated, it is bilingual, and it optimises for one thing: you walking into the loop able to drive a 45-minute system design round cold. 这个站就是我当时想要、却找不到的东西:一份带出处核实的 OpenAI / Anthropic 真题 Arena(2023–2026),把八本书压缩成面试可直接使用的专题页,再用一份八周学习计划把这摊内容串成线性路径。它有立场、支持中英双语,并且只为一件事服务:你走进面试室时,能够脱稿把一场 45 分钟的系统设计拉满。

② Methodology — how questions were collected and rated ② 方法论——题目如何收集与评级

Every question in the arena carries a source link and a credibility grade. We treat candidate self-reports differently from a company's own publication, and the letter on the card tells you which is which at a glance. Arena 中每道题都附上出处链接与可信度评级。公司自家发布的内容与匿名自述显然不应一视同仁,卡片右上角的字母可以让你一眼分辨两者。

The raw pool was assembled from 2023–2026 interview reports across LeetCode Discuss, Blind, Exponent's question index, PracHub's onsite bank, Glassdoor, Jointaro mock-interview logs, GitHub repos of leaked take-homes, and a handful of open-source interview guides published directly by the companies themselves. Each candidate report was cross-referenced: if three independent posts on Blind / LeetCode / Jointaro converged on roughly the same prompt in the same quarter, we treated that as one corroborated question and merged the variants. Questions with a single anonymous source were kept but visibly downgraded. Questions we could not source to a real interview within the 2023–2026 window were cut — there are no filler "just in case" problems. 原始题池汇总自 2023–2026 年的面经,来源包括:LeetCode Discuss、Blind、Exponent 题库、PracHub 的 Onsite 题集、Glassdoor、Jointaro 模拟面试记录、GitHub 上的泄露 take-home,以及公司自己公开的少量面试指南。每条候选人自述都要交叉核对:如果同一季度在 Blind、LeetCode、Jointaro 上有三条独立帖子大致指向同一题目,则合并为一条、统一变体。只有单一匿名来源的题目保留但显著降级。无法回溯到 2023–2026 真实面试的题目一律删除——没有「以防万一」的灌水题。

Credibility scale S / A / B / C / D可信度评级 S / A / B / C / D

Grade评级 Source type出处类型 Example示例 How to treat it如何对待
S Official / first-party — the company's own publication or interview guide.官方 / 第一手——公司自己发布的内容或面试指南。 Anthropic's public interview guide, OpenAI engineering blog describing real production trade-offs they ask about.Anthropic 公开的面试指南;OpenAI 工程博客中描述他们会追问的生产权衡。 Treat as ground truth. If S says "we ask about X", they ask about X.视作事实。S 级说「会问 X」,就是会问。
A Paper or official open source — peer-reviewed or maintained by the company.论文或官方开源——同行评审或公司维护。 arxiv papers on Constitutional AI, github.com/anthropics, OpenAI's evals repo, vLLM paper.Constitutional AI 的 arxiv 论文、github.com/anthropics、OpenAI evals 仓库、vLLM 论文。 Near-ground-truth for technical content, not for interview phrasing.技术内容近似事实,但不代表面试官原话。
B Structured question bank marked "Asked at" a given company.结构化题库且标注「Asked at」某公司。 PracHub onsite listings, Exponent curated library, paid interview platforms with moderation.PracHub onsite 列表、Exponent 精选题库、有审核的付费面试平台。 Trustworthy direction; exact wording may be paraphrased.方向可信,但原文可能已被改写。
C Anonymous candidate self-report.匿名候选人自述。 LeetCode Discuss, Blind posts, Jointaro anonymised mock logs, 1point3acres threads.LeetCode Discuss、Blind 帖子、Jointaro 匿名模拟记录、一亩三分地帖子。 Useful signal, especially when multiple independent posts agree. Single-source C's get a warning badge.有参考价值,尤其在多源吻合时。单一来源的 C 会带警示标。
D General blog or secondary re-posting without fresh evidence.未提供新证据的二手博客或转帖。 Medium summaries, WeChat gongzhonghao posts quoting a Blind thread from two years ago.Medium 总结、引用两年前 Blind 帖子的微信公众号文章。 Kept only when corroborating something else; never a sole source for an arena entry.仅在佐证其它证据时保留;绝不作为唯一来源。

Roughly 15% of the current arena sits at S/A, 40% at B, 40% at C with multi-source corroboration, and the residual 5% at single-source C marked with a visible warning. Anything D-only was cut. The grade is printed on every arena card so you can budget your attention accordingly — burn your mock-interview time on S and A first. 当前 Arena 中大约 15% 属 S/A,40% 属 B,40% 属于多源互证的 C,剩余 5% 是带明显警告的单源 C。仅 D 级的一律剔除。每张 Arena 卡片都会标注评级——把有限的模拟面试时间优先花在 S 与 A 上。

③ The 8-week study plan ③ 八周备考计划

Budget roughly 12–15 hours per week. The plan assumes a working engineer with solid coding fundamentals and some distributed systems intuition, but no specific LLM-infra background. Each week pairs reading with at least two arena attempts — reading alone does not build the verbal motor skill you need. 每周预算约 12–15 小时。计划针对有扎实编码功底、有一定分布式系统直觉、但缺乏 LLM 基建专项经验的在职工程师设计。每周都把阅读与至少两道 Arena 实战搭配——光读不练,练不出面试现场需要的「口头肌肉」。

timeline
  title 8-Week System Design Prep (12-15 hrs / week)
  Week 1-2 : Foundations and Framework
           : Estimation drills
           : 3 Alex Xu + 3 OpenAI classical arena
  Week 3-4 : Distributed Systems Deep Dive
           : DDIA Ch.5-9 (replication, partitioning, consensus, txns)
           : Pair each chapter with an arena question
  Week 5-6 : LLM Systems
           : Gulli + Chip Huyen (serving, RAG, agentic, training, eval)
           : Drill Anthropic arena questions
  Week 7   : ML and Safety + Mock Interviews
           : Constitutional AI + red-team + eval pipelines
           : Mocks with friend or Exponent
  Week 8   : Polish and Dress Rehearsals
           : 2-3 behavioural stories incl. safety-catch
           : Full 45-min rehearsals, frameworks cold
        
Week周次 Focus重点 Reading阅读 Arena drillsArena 练习 Hrs小时
1 Framework + estimation框架 + 估算 Alex Xu V1 Ch.1–3; ByteByteGo numbers cheat sheet.Alex Xu V1 第 1–3 章;ByteByteGo 数字速查。 3 Alex Xu classical designs solo-timed at 45 min.3 道 Alex Xu 经典题,45 分钟计时独立完成。 12
2 Classical SD under OpenAI lensOpenAI 视角下的经典题 Acing the SD Interview Ch.1–4; revisit framework.《Acing the SD Interview》第 1–4 章;回看框架。 3 OpenAI classical arena questions (rate-limiter, pastebin-style, notification).3 道 OpenAI 风格经典题(限流器、Pastebin 类、通知系统)。 13
3 Replication & partitioning复制与分区 DDIA Ch.5–6.DDIA 第 5–6 章。 2 arena questions where quorum / consistent hashing are central.2 道以 quorum / 一致性哈希为核心的 Arena。 14
4 Transactions & consensus事务与共识 DDIA Ch.7–9; Raft paper skim.DDIA 第 7–9 章;略读 Raft 论文。 2 arena questions where isolation level or leader election matters.2 道以隔离级别或 leader 选举为关键的题目。 14
5 LLM serving + RAGLLM 推理 + RAG Gulli Ch. on serving + RAG; Chip Huyen DMLS serving chapter.Gulli 推理与 RAG 相关章节;Chip Huyen《DMLS》推理章。 3 Anthropic arena questions: inference service, RAG, long-context eval.3 道 Anthropic Arena:推理服务、RAG、长上下文评估。 15
6 Agentic + training infraAgent 与训练基建 Agentic Design Patterns (full); Chip Huyen on training-data pipelines.《Agentic Design Patterns》全书;Chip Huyen 训练数据流水线相关章节。 2 arena questions on agent orchestration / tool-use + 1 on distributed training.2 道 agent 编排 / 工具调用题 + 1 道分布式训练题。 15
7 ML platform + safety + mocksML 平台 + 安全 + 模拟面试 ML SD Interview book (selected); Anthropic safety posts; Constitutional AI paper.《ML SD Interview》选读;Anthropic 安全博客;Constitutional AI 论文。 2 live mocks with a friend or Exponent coach; 1 safety-heavy arena question solo.与朋友或 Exponent 教练做 2 次真人 mock;独立完成 1 道安全主导的 Arena。 14
8 Polish & dress rehearsals打磨与彩排 Re-read your own notes; refine 2–3 behavioural stories including one "caught-a-safety-issue" story.重读自己的笔记;打磨 2–3 个行为面故事,含一则「发现并阻止一个安全问题」。 Two full 45-minute dress rehearsals, walking the frameworks cold from memory.两次完整 45 分钟彩排,脱稿默画框架。 12

④ How to use this site ④ 如何使用这个站

Don't feel obliged to read the Study Guide linearly. Start from whatever your weakest area is — probably LLM serving or safety if you're coming from a traditional backend background, probably distributed consensus if you're coming from an ML background. The one rule: never call a topic page "done" until you have attempted its linked arena question. Reading without drilling is how people walk into the loop with the illusion of competence. 不必按顺序通读学习手册。从最弱的一项切入即可——后端背景的同学多数最弱在 LLM 推理与 safety,ML 背景的同学往往栽在分布式共识上。唯一的硬性规则:只有把专题页关联的 Arena 真题至少做过一道,才算「这页学完」。只读不做,就是带着「假性掌握」走进面试的典型路径。

A

Mode A — Cram (2 weeks out)模式 A — 冲刺(面试前两周)

Triage the arena by company tag, skim only the companion topic pages of the questions you hit, and force yourself through 3 full 45-minute mock questions — ideally one OpenAI classical, one Anthropic LLM-infra, one safety-heavy.按公司 tag 对 Arena 做分流,只略读命中题目所关联的专题页,强迫自己完整做 3 道 45 分钟 mock——最好是一道 OpenAI 经典、一道 Anthropic LLM 基建、一道安全主导题。

B

Mode B — Deep prep (8 weeks)模式 B — 深度备考(八周)

Follow the 8-week plan above. This is the mode that gets you from "competent backend engineer" to "actually capable of driving an Anthropic round" — not faster.按上面的八周计划执行。这是从「合格的后端工程师」走到「真的能撑起 Anthropic 一轮」的模式,没法再压缩。

C

Mode C — Continuous模式 C — 长期维护

One topic page per week alongside your day job, rotating across foundations, distributed, LLM, ML, safety. Compounds over a year into genuinely senior-level depth.在职期间每周啃一页专题,在基础、分布式、LLM、ML、安全之间轮换。一年积累下来,能达到真正的 senior 级深度。

Practical tip实操建议

Keep a personal answer-template per topic as you go — your own 90-second opener for "design an LLM inference service", "design a RAG system", "design a rate limiter". Having these cold saves 5–7 minutes in the real interview for the interesting parts.边学边维护一份属于你自己的答题模板——为「设计 LLM 推理服务」「设计 RAG」「设计限流器」各写一段 90 秒开场白。把它们背熟,真实面试里就能给后续深挖节省 5–7 分钟。

⑤ Evaluation rubric — what they actually grade ⑤ 评分维度——他们真正在打分的东西

These are the six signals introduced on the home page, expanded here with "bad answer" vs "good answer" phrasing. Interviewers at OpenAI and Anthropic take structured notes against dimensions that map, roughly, to these six. 这是首页六个信号的展开版,每个信号都给出「差答案」与「好答案」的具体语感。OpenAI 与 Anthropic 的面试官打分表大致就是这六个维度的变体。

1. Abstraction1. 抽象能力

Can you carve a fuzzy prompt into clean subsystems, name each, and articulate the contract between them? Strong candidates introduce names like "scheduler", "KV cache manager", "policy filter" before they draw a box. Weak candidates draw boxes first, then retrofit names.能否把一个模糊需求切成边界清晰的子系统、命名、讲清彼此契约?强候选人会先说「我把它切成 scheduler、KV cache manager、policy filter」画框;弱候选人先画框再硬塞名字。

Bad: "I'll put an API gateway here, then a backend, then a database."差:「我画个 API gateway,再画个后端,再画个数据库。」

Good: "There are four concerns — ingress and auth, request scheduling under variable-length workloads, model execution on GPUs, and safety mediation. Let me solve each separately and then compose them."好:「这里有四个关注点——入口与鉴权、变长负载下的请求调度、GPU 上的模型执行,以及安全中介。我会分别解决,再组合起来。」

2. Trade-off articulation2. 权衡表达

Every design decision costs something. The signal is whether you volunteer the cost before the interviewer has to drag it out of you. "Consistent hashing lets us rebalance cheaply but hurts range queries and complicates multi-key transactions" is what the rubric rewards.每个设计决定都有代价。打分看的是你是否主动说出代价,而不是等面试官挖。像「一致性哈希让再平衡便宜,但代价是范围查询变差、多 key 事务复杂化」,这才是评分表想听到的。

Bad: "I'll use Redis because it's fast."差:「我用 Redis,因为它快。」

Good: "Redis for hot cache gives us sub-ms reads and simple invalidation, at the cost of memory footprint and a weaker durability story — I'll pair it with Postgres as the source of truth and accept the cache-consistency window."好:「热缓存走 Redis 拿到亚毫秒读和简单失效,代价是内存开销与较弱的持久性——因此配 Postgres 作为事实源,并接受一段缓存一致性窗口。」

3. Failure-mode reasoning3. 失败模式推理

What breaks first under load, partial failure, or adversarial input? Great candidates name the failure mode before it happens: "head-of-line blocking on the scheduler queue when a long-prefill request lands alongside decodes; I'd use chunked prefill and separate queues."在高负载、部分失败或对抗输入下,哪里最先坏?优秀候选人会在故障发生前主动命名它:「当长 prefill 请求和 decode 撞在 scheduler 队列里时会队头阻塞;我会用 chunked prefill 加独立队列。」

Bad: "If the database goes down, we return an error."差:「数据库挂了就返回错误。」

Good: "The first failure mode I worry about is replication lag on the follower when a write burst hits — stale reads, potential read-your-writes violation. I'd route a user's reads to the leader by user-id stickiness, and monitor replica lag SLI."好:「我最先担心写突发时 follower 的复制延迟——会出现陈旧读、可能违反 read-your-writes。我按 user-id 粘性把「你自己的读」路由到 leader,并监控复制延迟 SLI。」

4. Scale reasoning4. 规模推理

Back-of-envelope numbers, and whether they pass the smell test. If you say "10k QPS" and the interviewer asks "per shard or total?" you should already know. Cost per 1M tokens, GPU memory per request, bytes per row — these should fall out of your mouth without hesitation.快速估算,以及估算结果是否合常识。如果你说「10k QPS」,面试官追问「每个分片还是总的?」,你应已备好答案。每百万 token 成本、每请求 GPU 显存、每行字节数——这些数字应该脱口而出。

Bad: "I'll shard it so it scales."差:「我分片一下就能扩。」

Good: "Prefill is ~100 TFLOPs per 1k-token prompt on a 70B model, so an A100 at ~300 TFLOPs sustained can do roughly 3 prefills/sec. With 1k concurrent users averaging one request per 10s, I need ~33 A100s for prefill alone — call it 40 for headroom."好:「70B 模型上每 1k token 的 prefill 约 100 TFLOPs,一块持续 ~300 TFLOPs 的 A100 每秒能做约 3 次 prefill。1k 并发用户平均每人 10 秒一请求,仅 prefill 就需要 ~33 张 A100,留余量按 40 张算。」

5. Driving the conversation5. 主导对话

45 minutes is short. The interviewer wants to see you set the agenda, park tangents, call your own time, and ask them for preferences at decision points — not wait passively for prompts. "I want to spend 10 more minutes on the scheduler and then pivot to safety; does that split work for you?" is a senior move.45 分钟很短。面试官希望看你主动排议程、把岔路暂挂、自己报时、在决策点主动征询偏好,而不是被动等题。「我打算在 scheduler 上再花 10 分钟,然后切到 safety,你觉得这个比例合适吗?」是明显的高级信号。

Bad: Stays silent, waits for the next nudge.差:全程沉默,等下一句提示。

Good: "We've got 30 minutes left. The two deep-dives I'd pick are continuous batching and safety mediation — which do you want first?"好:「还有 30 分钟。我想挑两个深挖:continuous batching 与 safety 中介——你想先看哪个?」

6. Safety (the Anthropic gate)6. 安全(Anthropic 的准入门)

At Anthropic this is not a bonus — it is a gate. Any question can be failed by designing the perfect system and never mentioning misuse, jailbreaks, red-teaming, content policy, or evaluation. Even for an OpenAI classical system question, a one-sentence mention of abuse prevention at the end buys you a free signal.在 Anthropic,这不是加分项,而是门槛。任何一题都可能因为「把系统画完美、全程不提 misuse / 越狱 / 红队 / 内容政策 / 评估」而直接判负。即便是 OpenAI 的经典题,末尾加一句滥用防护也能白拿一个正向信号。

Bad: Never brings up the word "safety", "abuse", or "evaluation" unprompted.差:全程不主动提及「safety」「滥用」或「评估」。

Good: "Before we close — three safety hooks I'd wire in from day one: input policy filter, per-tenant rate limit tied to an abuse score, and an offline red-team eval pipeline that replays a curated jailbreak set against every model version before rollout."好:「收尾前再讲三个第一天就要接好的 safety 钩子:输入策略过滤、与滥用分数挂钩的按租户限流、以及在每次模型上线前用一组精选越狱集回放的离线红队评估流水线。」

⑥ Tech stack of this site ⑥ 本站技术栈

Deliberately boring. Static HTML, one CSS file, one JS file, and Mermaid for diagrams — no framework, no build step, no tracker. Bilingual content is a pair of spans per visible string, flipped by toggling a class on the html element (see assets/js/app.js). Theme (light/dark) is stored in localStorage and applied on first paint to avoid a flash. 刻意保持朴素。静态 HTML + 一个 CSS 文件 + 一个 JS 文件 + Mermaid 画图——无框架、无构建步骤、无埋点。双语实现是每个可见字符串配一对 span,由根元素上的 class 切换(见 assets/js/app.js)。主题(明/暗)存于 localStorage,在首绘前应用以避免闪烁。

Total footprint: ~45 HTML files, ~1000 lines of CSS, ~200 lines of JS. The whole site fits in a single git clone and renders offline. Anyone can fork it, delete my text, keep the scaffolding, and have a personal interview-prep site in an afternoon. 整体体量:约 45 个 HTML 文件、约 1000 行 CSS、约 200 行 JS。整站一次 git clone 就能离线打开。任何人都可以 fork,删掉我的文字、保留骨架,一个下午就能搭起自己的面经备考站。

Why no framework?为什么不用框架?

Because the content, not the stack, is the value. A static site that loads in 80 ms on a cheap laptop respects the reader. Also, I wanted the markup I was writing to look identical to what a candidate would draft on a whiteboard.因为价值在内容而不在技术栈。在一台普通笔记本上 80 ms 就能加载完的站点,是对读者的尊重。另外,我希望自己写的 markup 与候选人在白板上会画的东西长得一样。

⑦ Changelog & roadmap ⑦ 更新日志与路线图

Status as of April 20262026 年 4 月状态

  • 36+ arena questions, all with sourced-and-graded provenance.36+ 道 Arena 真题,全部带有出处与评级。
  • 20 topic pages covering foundations, distributed systems, LLM infra, ML platform, and safety.20 个专题页,覆盖基础、分布式、LLM 基建、ML 平台与 safety。
  • 2 deep-research markdown reports (one on LLM serving, one on safety evaluation pipelines).2 份深度研究 markdown(一份关于 LLM 推理,一份关于 safety 评估流水线)。
  • Full Chinese parity on every page — not machine-translated.所有页面完整中文对齐——非机器翻译。

Roadmap路线图

  • Add 10 more Anthropic-flavoured batch & evaluation questions, with emphasis on the "how would you grade this model" framing.新增 10 道 Anthropic 风格的批处理与评估题,重点在「你会如何给这个模型打分」的叙事。
  • A mock-interview companion mode: timed prompts, checklist-based self-grading against the 6-signal rubric.上线模拟面试伴侣模式:计时出题,按六信号评分表做清单式自评。
  • English-Chinese system-design glossary — every term from CAP to PagedAttention with both phrasings.中英系统设计词表——从 CAP 到 PagedAttention,每个术语附上双语表达。
  • MCP-agents deep-dive page once the MCP ecosystem stabilises more.待 MCP 生态更稳定后,新增 MCP-agents 深度专题页。

⑧ Not affiliated · contact ⑧ 非官方声明 · 联系方式

Not affiliated非官方声明

SD-Guide is an independent study project. It is not affiliated with, endorsed by, or sponsored by OpenAI or Anthropic. Company names, trademarks, and question attributions are used solely for educational reference. Any errors are mine; any sensitive leaked material reported to us is removed on request.SD-Guide 是一个独立的学习项目,与 OpenAI 或 Anthropic 没有任何隶属、背书或赞助关系。公司名、商标与题目归属仅用于教学参考。任何错误由本站承担;如有涉及敏感或违规泄露内容,收到通知后立即删除。

Feedback, corrections, and new question reports are welcome. File a GitHub issue (placeholder: github.com/your-handle/sd-guide/issues) or email yourname@example.com. Please anonymise any candidate-identifying details before sending — respect the people whose reports power this site. 欢迎反馈、勘误与新题提交。可在 GitHub 提交 issue(占位:github.com/your-handle/sd-guide/issues),或发邮件至 yourname@example.com。投递前请抹去可识别候选人身份的信息——请尊重那些贡献了面经的候选人。