Why a framework at all

A system design interview is not an oral exam on distributed systems trivia; it is a structured conversation under time pressure where the interviewer scores your engineering judgment. Alex Xu's 4-step template (scope → high-level → deep-dive → wrap) and Zhiyong Tan's longer checklist in Acing the System Design Interview Chapter 2 both converge on the same idea: spend the first 10 minutes nailing the problem, the next 15 sketching a reasonable baseline, and the last 20 demonstrating depth on one or two components the interviewer cares about.

The framework below is the tightest version that still survives a 45-minute slot. It prevents the two most common failure modes: premature architecture (drawing boxes before you know the QPS) and shallow breadth (describing seven services at the same level of detail, none of them deeply).

Source cross-reference

Alex Xu V1 Ch.3 gives the 4-step skeleton; Acing SDI Ch.2 adds the requirement-discussion taxonomy (functional, non-functional, constraints); Chip Huyen DMLS Ch.2 supplies the ML-specific framing (objective → data → model → deployment) you'll need for any LLM/ML question.

Phase 1 — Requirements & scoping (5 min)

Before touching the whiteboard, extract three lists:

  • Functional requirements: the two or three core user journeys. "Upload video, watch video, search video" is enough for YouTube; do not list 15 features.
  • Non-functional requirements (NFRs): availability target (three-nines = ~8.76 h downtime/year, four-nines = ~52 min), p99 latency budget, durability, consistency model, regulatory (GDPR, HIPAA, SOC2).
  • Explicit out-of-scope: "We will not design the mobile client, the billing system, or the CDN provider." This buys you back 5 minutes later when the interviewer asks.

Write NFRs as numbers, never adjectives. "Low latency" is a junior answer. "p99 read < 100 ms in-region, p99 write < 300 ms, 99.95% availability, RPO < 1 min, RTO < 5 min" is a staff answer.

Anti-pattern

Do not start listing databases ("I'll use Postgres and Redis and Kafka...") before requirements are nailed. Interviewers silently flag this as pattern-matching without thought. Stay on requirements until you can state the problem back in one sentence.

Phase 2 — Back-of-envelope scale (5 min)

Convert product assumptions into numeric targets. For a chat app: 500 M DAU × 40 messages/day = 20 B messages/day ≈ 230 k writes/sec average, ~5× for peak = 1.15 M writes/sec. Storage at 200 bytes/message × 20 B/day = 4 TB/day ≈ 1.5 PB/year.

You will refer back to these numbers in every later phase. A 1 M QPS system needs horizontal sharding from day one; a 5 k QPS system can live on a single Postgres primary with read replicas. Getting the order of magnitude right is 80% of the value; precision past one significant figure is wasted breath.

See the companion page on estimation for the constants (Jeff Dean numbers, latency cheat-sheet, LLM-token economics).

Phase 3 — API surface (5 min)

Define 3–6 endpoints, typed. Example for a URL shortener:

POST /v1/links          { long_url, custom_alias? } -> { short_url, expires_at }
GET  /v1/{alias}        302 -> long_url
GET  /v1/links/{alias}  -> { clicks, created_at, owner }
DELETE /v1/links/{alias}

State explicitly: auth model (JWT, mTLS, API key), idempotency keys on POST, pagination style (cursor vs offset), and rate-limit bucket. For LLM-flavored questions, always mention streaming responses (text/event-stream or gRPC server-streaming) and tool-call schemas.

OpenAI-specific

OpenAI interviews love /v1/chat/completions-shaped APIs. Be ready to discuss SSE framing, stream_options, function-calling JSON schemas, and how you would backpressure a slow client without dropping the generation.

Phase 4 — Data model & storage (5 min)

Pick a primary store and state the schema in 3–5 lines. Justify the choice with one sentence referencing access pattern, not brand loyalty.

  • OLTP key-value with predictable partition (user-scoped inbox, session store): DynamoDB, Cassandra, FoundationDB.
  • Relational with joins & strong consistency: Postgres (single-region up to ~50 k writes/sec), Spanner / CockroachDB for global.
  • Append-only event log: Kafka + tiered storage, or a purpose-built log like Pulsar.
  • Blob: S3 / GCS, 11 nines durability, ~5 ms first-byte from same region.
  • Vector: pgvector (<10 M), Milvus / Pinecone / Turbopuffer (100 M+). See the vector DB arena question.

Phase 5 — High-level architecture (10 min)

Draw 5–8 boxes. Client → load balancer → stateless service tier → cache + primary store + async queue + downstream workers. Label every arrow with the protocol (HTTP/2, gRPC, Kafka topic) and the direction of data flow.

flowchart LR
    C[Client] -->|HTTPS| LB[Global LB / Anycast]
    LB --> API[Stateless API tier]
    API -->|read-through| CACHE[(Redis cluster)]
    API -->|writes| DB[(Primary store)]
    API -->|events| Q[[Kafka]]
    Q --> W[Workers]
    W --> DB
    W -->|cold| OBJ[(S3)]

Call out at least one concrete number per box: "Redis cluster sized for 100 k ops/sec/node, 6 shards, 3 replicas = 1.8 M ops/sec cluster throughput." Numbers separate candidates who have run systems from those who have only read about them.

Phase 6 — Deep dive + wrap (15 min)

The interviewer will nominate one or two components for deep-dive. Common targets:

  • How do you keep the cache consistent with the primary? → write-through vs cache-aside, invalidation lag, the 2013 Facebook memcached paper's "leases" trick.
  • What if the leader DB fails? → failover playbook: consensus on new leader (etcd/Raft), fencing tokens (see consensus), RPO window, split-brain avoidance.
  • How do you rate-limit? → token bucket at the edge, sliding-window log at user scope (see rate-limiter arena).

Reserve 2 minutes at the end for an explicit trade-off summary: "If we needed global strong consistency I would swap Postgres for Spanner and accept 2× write latency; if read volume doubled, add regional read replicas with bounded staleness." This is the single highest-signal minute of the interview.

OpenAI vs Anthropic-specific tweaks

The framework is the same; the emphasis shifts.

DimensionOpenAI styleAnthropic style
Opening questionProduct-flavored ("Design ChatGPT memory")Safety/eval-flavored ("Design a red-team harness")
Depth targetGPU scheduling, KV-cache sharing, streaming protocolPolicy pipeline, audit logs, rollback, sandboxing
Preferred stack cuesPython + Rust services, Triton, vLLM, KubernetesPython + Rust, custom inference stack, strong typing, extensive evals
Must-mentionThroughput/$ and p50 vs p99 under batchingConstitutional AI feedback loop, refusal metrics, provenance

Anthropic-specific

Anthropic interviewers routinely pivot to safety: "What happens if a prompt tries to exfiltrate another user's context?" Have a canned answer involving tenant isolation at the process level, scrubbed logs, and a circuit breaker that kills generations on policy hits. Reference Constitutional AI and the threat modeling page if they push.

Final anti-pattern

Running out of time on Phase 5 and never reaching deep-dive is the single most common reason strong candidates fail. Set a mental timer: if you are still drawing boxes at minute 25, stop, pick one component, and go deep.

为什么需要框架

系统设计面试不是分布式系统知识的口试,而是在时间压力下的结构化对话,面试官评估的是你的工程判断力。Alex Xu 的 4 步模板(范围 → 高层 → 深挖 → 收尾)与 Zhiyong Tan 在《Acing the System Design Interview》第 2 章的更长清单,核心思想一致:前 10 分钟锁定问题,中间 15 分钟画出合理基线,最后 20 分钟在一到两个组件上展示深度。

下面这套框架是仍能在 45 分钟内走完的最紧凑版本。它避免了两种常见失败:过早架构(还没算 QPS 就开始画框)和浅层广度(七个服务全部停在同一个粒度,没有任何一个讲透)。

来源交叉引用

Alex Xu V1 第 3 章给出 4 步骨架;Acing SDI 第 2 章补充需求讨论分类(功能性、非功能性、约束);Chip Huyen DMLS 第 2 章提供 ML 特有的问题框架(目标 → 数据 → 模型 → 部署),回答 LLM/ML 题必备。

阶段一 — 需求澄清(5 分钟)

在动笔之前,先提炼三份清单:

  • 功能需求:两到三条核心用户旅程。YouTube 题目说「上传视频、观看视频、搜索视频」就够了,不要列 15 个功能。
  • 非功能需求(NFR):可用性目标(三个九 ≈ 每年 8.76 小时停机,四个九 ≈ 52 分钟)、p99 延迟预算、持久性、一致性模型、合规要求(GDPR、HIPAA、SOC2)。
  • 明确的范围外:「我们不设计移动端、计费系统、CDN 提供商。」这在后面能帮你省回 5 分钟。

NFR 要写成数字,不要写形容词。「低延迟」是初级答案;「同区域 p99 读 < 100 ms,p99 写 < 300 ms,可用性 99.95%,RPO < 1 分钟,RTO < 5 分钟」是 staff 答案。

反模式

需求还没锁定就开始罗列数据库(「我用 Postgres 加 Redis 加 Kafka……」)。面试官会默默扣分:这是无思考的模式匹配。坚持到你能用一句话把问题复述出来。

阶段二 — 规模估算(5 分钟)

把产品假设换成数字。聊天应用:5 亿日活 × 每天 40 条消息 = 200 亿条/天 ≈ 23 万写入/秒的平均值,峰值按 5× 算 = 115 万写入/秒。存储按 200 字节/条 × 200 亿/天 = 4 TB/天 ≈ 1.5 PB/年。

后续每个阶段都会引用这些数字。100 万 QPS 的系统从第一天就要水平分片;5000 QPS 的系统单主 Postgres 加读副本完全能扛。数量级正确是 80% 的价值,精度到一位有效数字之外都是浪费时间。

常数见配套页面 估算(Jeff Dean 数字、延迟速查表、LLM token 经济学)。

阶段三 — API 设计(5 分钟)

定义 3–6 个带类型的端点。短链接服务示例:

POST /v1/links          { long_url, custom_alias? } -> { short_url, expires_at }
GET  /v1/{alias}        302 -> long_url
GET  /v1/links/{alias}  -> { clicks, created_at, owner }
DELETE /v1/links/{alias}

明确说明:鉴权方式(JWT、mTLS、API key)、POST 的幂等 key、分页方式(cursor vs offset)、限流桶。LLM 类题目一定要提流式响应text/event-stream 或 gRPC 服务端流)与工具调用 schema

OpenAI 专属

OpenAI 面试偏爱 /v1/chat/completions 形状的 API。准备好 SSE 帧格式、stream_options、function calling 的 JSON schema,以及如何在不中断生成的前提下对慢客户端施加反压。

阶段四 — 数据模型与存储(5 分钟)

选一个主存储,写 3–5 行 schema。用一句话根据访问模式(不是品牌偏好)证明选择合理。

  • OLTP 键值,分区可预测(用户收件箱、会话):DynamoDB、Cassandra、FoundationDB。
  • 关系型,需 join 与强一致:Postgres(单区域到 ~5 万写/秒);全球场景用 Spanner / CockroachDB。
  • 仅追加事件日志:Kafka + 分层存储,或 Pulsar。
  • 对象存储:S3 / GCS,11 个九的持久性,同区域首字节 ~5 ms。
  • 向量库:pgvector(<1000 万)、Milvus / Pinecone / Turbopuffer(1 亿+)。见 向量数据库真题

阶段五 — 高层架构(10 分钟)

画 5–8 个框:客户端 → 负载均衡 → 无状态服务层 → 缓存 + 主存储 + 异步队列 + 下游 worker。每条箭头标明协议(HTTP/2、gRPC、Kafka topic)和数据流方向。

flowchart LR
    C[客户端] -->|HTTPS| LB[全局 LB / Anycast]
    LB --> API[无状态 API 层]
    API -->|read-through| CACHE[(Redis 集群)]
    API -->|writes| DB[(主存储)]
    API -->|events| Q[[Kafka]]
    Q --> W[Workers]
    W --> DB
    W -->|冷数据| OBJ[(S3)]

每个框至少报出一个具体数字:「Redis 集群按单节点 10 万 ops/秒规划,6 分片 × 3 副本 = 集群 180 万 ops/秒。」数字能区分真正跑过系统的人和只读过书的人。

阶段六 — 深挖与收尾(15 分钟)

面试官会点一到两个组件深挖。常见目标:

  • 缓存如何与主存储保持一致? → write-through vs cache-aside,失效延迟,2013 年 Facebook memcached 论文里的「lease」技巧。
  • 主库挂了怎么办? → 故障切换:用 etcd/Raft 选主、fencing token(见 共识页面)、RPO 窗口、脑裂防范。
  • 如何限流? → 边缘 token bucket + 用户维度滑动窗口日志(见 限流真题)。

最后 2 分钟留给明确的权衡总结:「如果需要全球强一致,我会把 Postgres 换成 Spanner 并接受 2× 写延迟;如果读量翻倍,增加有界陈旧的区域读副本。」这是整场面试信号最强的一分钟。

OpenAI 与 Anthropic 特定调整

框架一样,重心不同。

维度OpenAI 风格Anthropic 风格
开场题产品风(「设计 ChatGPT 记忆」)安全/评测风(「设计红队 harness」)
深挖方向GPU 调度、KV-cache 共享、流式协议策略流水线、审计日志、回滚、沙箱
偏好栈Python + Rust、Triton、vLLM、K8sPython + Rust、自研推理栈、强类型、大量 eval
必须提吞吐/$ 与 batching 下的 p50/p99Constitutional AI 反馈闭环、拒答率指标、溯源

Anthropic 专属

Anthropic 面试官经常转向安全:「如果一个 prompt 试图窃取另一用户的上下文怎么办?」请准备好答案:进程级租户隔离、日志脱敏、策略命中时熔断生成。若被追问,引用 Constitutional AI 和 威胁建模页面

最终反模式

在阶段五卡住、从未进入深挖,是强候选人失败的头号原因。心里设个闹钟:第 25 分钟还在画框,立刻停、挑一个组件、深入下去。