The Interview Framework

Why a framework at all

A system design interview is not an oral exam on distributed systems trivia; it is a structured conversation under time pressure where the interviewer scores your engineering judgment. Alex Xu's 4-step template (scope → high-level → deep-dive → wrap) and Zhiyong Tan's longer checklist in Acing the System Design Interview Chapter 2 both converge on the same idea: spend the first 10 minutes nailing the problem, the next 15 sketching a reasonable baseline, and the last 20 demonstrating depth on one or two components the interviewer cares about.

The framework below is the tightest version that still survives a 45-minute slot. It prevents the two most common failure modes: premature architecture (drawing boxes before you know the QPS) and shallow breadth (describing seven services at the same level of detail, none of them deeply).

Source cross-reference

Alex Xu V1 Ch.3 gives the 4-step skeleton; Acing SDI Ch.2 adds the requirement-discussion taxonomy (functional, non-functional, constraints); Chip Huyen DMLS Ch.2 supplies the ML-specific framing (objective → data → model → deployment) you'll need for any LLM/ML question.

Phase 1 — Requirements & scoping (5 min)

Before touching the whiteboard, extract three lists:

Functional requirements: the two or three core user journeys. "Upload video, watch video, search video" is enough for YouTube; do not list 15 features.
Non-functional requirements (NFRs): availability target (three-nines = ~8.76 h downtime/year, four-nines = ~52 min), p99 latency budget, durability, consistency model, regulatory (GDPR, HIPAA, SOC2).
Explicit out-of-scope: "We will not design the mobile client, the billing system, or the CDN provider." This buys you back 5 minutes later when the interviewer asks.

Write NFRs as numbers, never adjectives. "Low latency" is a junior answer. "p99 read < 100 ms in-region, p99 write < 300 ms, 99.95% availability, RPO < 1 min, RTO < 5 min" is a staff answer.

Anti-pattern

Do not start listing databases ("I'll use Postgres and Redis and Kafka...") before requirements are nailed. Interviewers silently flag this as pattern-matching without thought. Stay on requirements until you can state the problem back in one sentence.

Phase 2 — Back-of-envelope scale (5 min)

Convert product assumptions into numeric targets. For a chat app: 500 M DAU × 40 messages/day = 20 B messages/day ≈ 230 k writes/sec average, ~5× for peak = 1.15 M writes/sec. Storage at 200 bytes/message × 20 B/day = 4 TB/day ≈ 1.5 PB/year.

You will refer back to these numbers in every later phase. A 1 M QPS system needs horizontal sharding from day one; a 5 k QPS system can live on a single Postgres primary with read replicas. Getting the order of magnitude right is 80% of the value; precision past one significant figure is wasted breath.

See the companion page on estimation for the constants (Jeff Dean numbers, latency cheat-sheet, LLM-token economics).

Phase 3 — API surface (5 min)

Define 3–6 endpoints, typed. Example for a URL shortener:

POST /v1/links          { long_url, custom_alias? } -> { short_url, expires_at }
GET  /v1/{alias}        302 -> long_url
GET  /v1/links/{alias}  -> { clicks, created_at, owner }
DELETE /v1/links/{alias}

State explicitly: auth model (JWT, mTLS, API key), idempotency keys on POST, pagination style (cursor vs offset), and rate-limit bucket. For LLM-flavored questions, always mention streaming responses (text/event-stream or gRPC server-streaming) and tool-call schemas.

OpenAI-specific

OpenAI interviews love /v1/chat/completions-shaped APIs. Be ready to discuss SSE framing, stream_options, function-calling JSON schemas, and how you would backpressure a slow client without dropping the generation.

Phase 4 — Data model & storage (5 min)

Pick a primary store and state the schema in 3–5 lines. Justify the choice with one sentence referencing access pattern, not brand loyalty.

OLTP key-value with predictable partition (user-scoped inbox, session store): DynamoDB, Cassandra, FoundationDB.
Relational with joins & strong consistency: Postgres (single-region up to ~50 k writes/sec), Spanner / CockroachDB for global.
Append-only event log: Kafka + tiered storage, or a purpose-built log like Pulsar.
Blob: S3 / GCS, 11 nines durability, ~5 ms first-byte from same region.
Vector: pgvector (<10 M), Milvus / Pinecone / Turbopuffer (100 M+). See the vector DB arena question.

Phase 5 — High-level architecture (10 min)

Draw 5–8 boxes. Client → load balancer → stateless service tier → cache + primary store + async queue + downstream workers. Label every arrow with the protocol (HTTP/2, gRPC, Kafka topic) and the direction of data flow.

flowchart LR
    C[Client] -->|HTTPS| LB[Global LB / Anycast]
    LB --> API[Stateless API tier]
    API -->|read-through| CACHE[(Redis cluster)]
    API -->|writes| DB[(Primary store)]
    API -->|events| Q[[Kafka]]
    Q --> W[Workers]
    W --> DB
    W -->|cold| OBJ[(S3)]

Call out at least one concrete number per box: "Redis cluster sized for 100 k ops/sec/node, 6 shards, 3 replicas = 1.8 M ops/sec cluster throughput." Numbers separate candidates who have run systems from those who have only read about them.

Phase 6 — Deep dive + wrap (15 min)

The interviewer will nominate one or two components for deep-dive. Common targets:

How do you keep the cache consistent with the primary? → write-through vs cache-aside, invalidation lag, the 2013 Facebook memcached paper's "leases" trick.
What if the leader DB fails? → failover playbook: consensus on new leader (etcd/Raft), fencing tokens (see consensus), RPO window, split-brain avoidance.
How do you rate-limit? → token bucket at the edge, sliding-window log at user scope (see rate-limiter arena).

Reserve 2 minutes at the end for an explicit trade-off summary: "If we needed global strong consistency I would swap Postgres for Spanner and accept 2× write latency; if read volume doubled, add regional read replicas with bounded staleness." This is the single highest-signal minute of the interview.

OpenAI vs Anthropic-specific tweaks

The framework is the same; the emphasis shifts.

Dimension	OpenAI style	Anthropic style
Opening question	Product-flavored ("Design ChatGPT memory")	Safety/eval-flavored ("Design a red-team harness")
Depth target	GPU scheduling, KV-cache sharing, streaming protocol	Policy pipeline, audit logs, rollback, sandboxing
Preferred stack cues	Python + Rust services, Triton, vLLM, Kubernetes	Python + Rust, custom inference stack, strong typing, extensive evals
Must-mention	Throughput/$ and p50 vs p99 under batching	Constitutional AI feedback loop, refusal metrics, provenance

Anthropic-specific

Anthropic interviewers routinely pivot to safety: "What happens if a prompt tries to exfiltrate another user's context?" Have a canned answer involving tenant isolation at the process level, scrubbed logs, and a circuit breaker that kills generations on policy hits. Reference Constitutional AI and the threat modeling page if they push.

Final anti-pattern

Running out of time on Phase 5 and never reaching deep-dive is the single most common reason strong candidates fail. Set a mental timer: if you are still drawing boxes at minute 25, stop, pick one component, and go deep.

为什么需要框架

系统设计面试不是分布式系统知识的口试，而是在时间压力下的结构化对话，面试官评估的是你的工程判断力。Alex Xu 的 4 步模板（范围 → 高层 → 深挖 → 收尾）与 Zhiyong Tan 在《Acing the System Design Interview》第 2 章的更长清单，核心思想一致：前 10 分钟锁定问题，中间 15 分钟画出合理基线，最后 20 分钟在一到两个组件上展示深度。

下面这套框架是仍能在 45 分钟内走完的最紧凑版本。它避免了两种常见失败：过早架构（还没算 QPS 就开始画框）和浅层广度（七个服务全部停在同一个粒度，没有任何一个讲透）。

来源交叉引用

Alex Xu V1 第 3 章给出 4 步骨架；Acing SDI 第 2 章补充需求讨论分类（功能性、非功能性、约束）；Chip Huyen DMLS 第 2 章提供 ML 特有的问题框架（目标 → 数据 → 模型 → 部署），回答 LLM/ML 题必备。

阶段一 — 需求澄清（5 分钟）

在动笔之前，先提炼三份清单：

功能需求：两到三条核心用户旅程。YouTube 题目说「上传视频、观看视频、搜索视频」就够了，不要列 15 个功能。
非功能需求（NFR）：可用性目标（三个九 ≈ 每年 8.76 小时停机，四个九 ≈ 52 分钟）、p99 延迟预算、持久性、一致性模型、合规要求（GDPR、HIPAA、SOC2）。
明确的范围外：「我们不设计移动端、计费系统、CDN 提供商。」这在后面能帮你省回 5 分钟。

NFR 要写成数字，不要写形容词。「低延迟」是初级答案；「同区域 p99 读 < 100 ms，p99 写 < 300 ms，可用性 99.95%，RPO < 1 分钟，RTO < 5 分钟」是 staff 答案。

反模式

需求还没锁定就开始罗列数据库（「我用 Postgres 加 Redis 加 Kafka……」）。面试官会默默扣分：这是无思考的模式匹配。坚持到你能用一句话把问题复述出来。

阶段二 — 规模估算（5 分钟）

把产品假设换成数字。聊天应用：5 亿日活 × 每天 40 条消息 = 200 亿条/天 ≈ 23 万写入/秒的平均值，峰值按 5× 算 = 115 万写入/秒。存储按 200 字节/条 × 200 亿/天 = 4 TB/天 ≈ 1.5 PB/年。

后续每个阶段都会引用这些数字。100 万 QPS 的系统从第一天就要水平分片；5000 QPS 的系统单主 Postgres 加读副本完全能扛。数量级正确是 80% 的价值，精度到一位有效数字之外都是浪费时间。

常数见配套页面估算（Jeff Dean 数字、延迟速查表、LLM token 经济学）。

阶段三 — API 设计（5 分钟）

定义 3–6 个带类型的端点。短链接服务示例：

POST /v1/links          { long_url, custom_alias? } -> { short_url, expires_at }
GET  /v1/{alias}        302 -> long_url
GET  /v1/links/{alias}  -> { clicks, created_at, owner }
DELETE /v1/links/{alias}

明确说明：鉴权方式（JWT、mTLS、API key）、POST 的幂等 key、分页方式（cursor vs offset）、限流桶。LLM 类题目一定要提流式响应（text/event-stream 或 gRPC 服务端流）与工具调用 schema。

OpenAI 专属

OpenAI 面试偏爱 /v1/chat/completions 形状的 API。准备好 SSE 帧格式、stream_options、function calling 的 JSON schema，以及如何在不中断生成的前提下对慢客户端施加反压。

阶段四 — 数据模型与存储（5 分钟）

选一个主存储，写 3–5 行 schema。用一句话根据访问模式（不是品牌偏好）证明选择合理。

OLTP 键值，分区可预测（用户收件箱、会话）：DynamoDB、Cassandra、FoundationDB。
关系型，需 join 与强一致：Postgres（单区域到 ~5 万写/秒）；全球场景用 Spanner / CockroachDB。
仅追加事件日志：Kafka + 分层存储，或 Pulsar。
对象存储：S3 / GCS，11 个九的持久性，同区域首字节 ~5 ms。
向量库：pgvector（<1000 万）、Milvus / Pinecone / Turbopuffer（1 亿+）。见向量数据库真题。

阶段五 — 高层架构（10 分钟）

画 5–8 个框：客户端 → 负载均衡 → 无状态服务层 → 缓存 + 主存储 + 异步队列 + 下游 worker。每条箭头标明协议（HTTP/2、gRPC、Kafka topic）和数据流方向。

flowchart LR
    C[客户端] -->|HTTPS| LB[全局 LB / Anycast]
    LB --> API[无状态 API 层]
    API -->|read-through| CACHE[(Redis 集群)]
    API -->|writes| DB[(主存储)]
    API -->|events| Q[[Kafka]]
    Q --> W[Workers]
    W --> DB
    W -->|冷数据| OBJ[(S3)]

每个框至少报出一个具体数字：「Redis 集群按单节点 10 万 ops/秒规划，6 分片 × 3 副本 = 集群 180 万 ops/秒。」数字能区分真正跑过系统的人和只读过书的人。

阶段六 — 深挖与收尾（15 分钟）

面试官会点一到两个组件深挖。常见目标：

缓存如何与主存储保持一致？ → write-through vs cache-aside，失效延迟，2013 年 Facebook memcached 论文里的「lease」技巧。
主库挂了怎么办？ → 故障切换：用 etcd/Raft 选主、fencing token（见共识页面）、RPO 窗口、脑裂防范。
如何限流？ → 边缘 token bucket + 用户维度滑动窗口日志（见限流真题）。

最后 2 分钟留给明确的权衡总结：「如果需要全球强一致，我会把 Postgres 换成 Spanner 并接受 2× 写延迟；如果读量翻倍，增加有界陈旧的区域读副本。」这是整场面试信号最强的一分钟。

OpenAI 与 Anthropic 特定调整

框架一样，重心不同。

维度	OpenAI 风格	Anthropic 风格
开场题	产品风（「设计 ChatGPT 记忆」）	安全/评测风（「设计红队 harness」）
深挖方向	GPU 调度、KV-cache 共享、流式协议	策略流水线、审计日志、回滚、沙箱
偏好栈	Python + Rust、Triton、vLLM、K8s	Python + Rust、自研推理栈、强类型、大量 eval
必须提	吞吐/$ 与 batching 下的 p50/p99	Constitutional AI 反馈闭环、拒答率指标、溯源

Anthropic 专属

Anthropic 面试官经常转向安全：「如果一个 prompt 试图窃取另一用户的上下文怎么办？」请准备好答案：进程级租户隔离、日志脱敏、策略命中时熔断生成。若被追问，引用 Constitutional AI 和威胁建模页面。

最终反模式

在阶段五卡住、从未进入深挖，是强候选人失败的头号原因。心里设个闹钟：第 25 分钟还在画框，立刻停、挑一个组件、深入下去。