Three questions, one mental model

Facebook news feed, Slack chat, and a Twilio-style notification service look different, but all three reduce to: how does a producer event reach N consumers, and where does the fan-out live? Master the fan-out spectrum and you can answer any of them on a whiteboard.

  • News feed — 1 author post → 1,000 followers (Twitter median) or 100M followers (celebrity). Asynchronous, minutes-tolerant.
  • Chat — 1 message → 2 people (DM) or 10k (Slack channel). Synchronous, seconds-tolerant, ordered, presence-sensitive.
  • Notifications — 1 event → 1 user across N channels (APNs, FCM, SMS, email). Fire-and-forget, minutes-tolerant, strict idempotency.

The shared mental model: producer → pub/sub log (Kafka) → fan-out workers → per-consumer materialized view. The variable is what the materialized view is (feed, inbox, mailbox) and how fresh it must be.

Source cross-reference

Alex Xu V1 Ch.10 (notifications), Ch.11 (news feed), Ch.12 (chat) are the textbook answers. Acing SDI Ch.14 (messaging) and Ch.16 (news feed) add functional-partitioning depth.

News feed: fan-out on write vs read

The core trade-off:

StrategyWrite costRead costStorageFreshness
Fan-out on write (push)O(followers) writes to follower timelinesO(1) read from your timelineDuplicate post in each follower's listSlow for celebrities, fast reads
Fan-out on read (pull)O(1) write to author's postsO(followees) reads + mergeSingle copyAlways fresh, expensive reads
HybridPush for normal users, pull for celebritiesO(1) + a few pullsMostly duplicatedBest of both

Numbers: Twitter's median user has ~200 followers; the 99th percentile has 10k+; Taylor Swift has 90M. A pure push model for her generates 90M Redis writes per tweet — fine at Twitter's write volume (~300k tweets/s at peak) only because they batch and offload to a dedicated "fanout" tier. A pure pull model would require reading from thousands of friends for a typical user — the merge is too slow.

flowchart TB
  A[User posts] --> K[Kafka: post_events]
  K --> FW[Fanout Workers]
  FW -->|if author.followers<10k| T[(Redis: per-user timeline ZSET)]
  FW -->|if celebrity| C[(Celebrity posts store)]
  U[Viewer opens feed] --> API[Feed API]
  API --> T
  API --> C
  API -->|merge+rank| U

The celebrity problem (hybrid fan-out)

The classic interview trap: if you pick push, the hiring manager asks "what about Taylor Swift?" If you pick pull, they ask "what about 500M DAU reading their feed?"

The answer is hybrid:

  1. Define a threshold (e.g., >10k followers = celebrity).
  2. Normal users: fan-out on write into follower timeline ZSETs (ZADD timeline:{follower_id} ts post_id).
  3. Celebrities: skip fan-out; writers emit to a per-celebrity post list.
  4. On read, merge the follower's timeline ZSET with the recent posts of each celebrity they follow (usually 0–50 celebs).

Ranking (for Facebook-style feeds) is a separate ML step on top: retrieve 2000 candidates, rerank with a model, paginate. Chip Huyen Ch.6 discusses the offline/online evaluation of ranking. Khang Pham covers the two-tower retrieval architecture.

Anti-pattern

Materializing timelines with SQL SELECT * FROM posts WHERE author IN (followees) ORDER BY ts DESC LIMIT 50. This reads millions of rows for any user who follows 500 people. Always use a precomputed timeline or a time-partitioned read.

Chat: WebSocket, presence, delivery semantics

Transport

  • Long polling — fallback; high overhead, extra latency.
  • Server-Sent Events (SSE) — one-way push; good for notifications, not chat.
  • WebSocket — bidirectional, single TCP connection, wss://. Every modern chat uses it (Slack, Discord, Telegram web).

A single WebSocket server holds ~50k–100k concurrent connections (tuning ulimit, kernel socket buffers, and epoll). Slack holds tens of millions of concurrent sockets across a fleet.

Architecture

flowchart LR
  U1[User A socket] --> GW[WS Gateway
sticky by user_id] U2[User B socket] --> GW GW --> S[Session Store
user_id to gateway_host] GW --> PS[Pub/Sub: Kafka or Redis] PS --> GW GW --> MS[Message Service] MS --> DB[(Messages DB
partition by channel_id)] MS --> PR[Presence Service
Redis TTL heartbeat]

Presence

Presence (online/offline/away) is deceptively hard. Naive "update a row" = write amplification on every heartbeat. Standard answer: Redis key presence:{user_id} with TTL 30s, refreshed by heartbeat. To notify friends, fan-out presence changes into channel Pub/Sub. Slack channels use lazy presence: only compute online members when someone opens the member list.

Delivery semantics

  • At-least-once is the default. Client dedupes by (channel_id, msg_seq).
  • Ordered within a channel: partition Kafka by channel_id; messages in one partition are ordered. Across channels, no order guarantee.
  • Read receipts: last-read pointer per (user, channel); O(1) updates, O(1) unread count via latest_seq - last_read_seq.

Notifications: push / SMS / email fan-out

A notification system accepts an event (e.g., "order shipped") and fans it out across channels subject to user preferences, throttling, and deliverability rules.

flowchart LR
  EV[Event Sources] --> IQ[Ingest Queue]
  IQ --> NS[Notification Service]
  NS -->|pref lookup| PR[(Prefs DB)]
  NS -->|render template| TP[Template Service]
  NS --> Q1[APNs queue]
  NS --> Q2[FCM queue]
  NS --> Q3[Twilio SMS]
  NS --> Q4[SES email]
  Q1 --> A[APNs]
  Q2 --> G[FCM]
  Q3 --> T[Carrier]
  Q4 --> E[SMTP]
  A --> DLQ1[DLQ on invalid token]
  • Idempotency key = (event_id, user_id, channel). Without it, a worker retry double-texts the user at 2am.
  • Rate limits per user — no more than K notifications per hour to avoid spam fatigue. Token bucket per user.
  • DLQ and invalid-token cleanup — APNs returns "BadDeviceToken"; you must delete that token or you'll leak capacity forever.
  • Deliverability — email domain reputation (DKIM/SPF/DMARC), SMS opt-out (STOP keyword compliance), APNs silent pushes.

OpenAI/Anthropic angle

LLM platforms have internal notification systems for safety escalations: content classifier flags → on-call paging + email digest + Slack. Anthropic publishes Claude safety cards and uses similar fan-out for policy reviewers — the same architecture as any consumer notification system, but with stricter audit logging.

Cross-cutting concerns and interview checklist

Shared concerns

  • Ordering — partition by the entity that defines order (channel_id for chat, user_id for feed, event_id for notifications).
  • Idempotency — producer retry + consumer dedup via (source, seq).
  • Backpressure — bounded queues + shed or buffer on spike.
  • Observability — end-to-end latency histograms from event to delivered, per channel.

Anti-patterns

  • Using a relational DB for the active timeline ZSET. Redis sorted sets give you O(log N) inserts and O(1) top-K reads — Postgres will cry.
  • Broadcasting presence on every keystroke. Debounce to 1 update / 30s.
  • One Kafka partition for all chat messages. You lose parallelism; channels block each other.
  • Retrying failed SMS forever. Set a retry cap (e.g., 3) and move to DLQ — carriers rate-limit aggressively.

Whiteboard checklist: clarify event type and consumer count; pick fan-out strategy (write/read/hybrid); choose transport (WS for chat, queue for notifications, ZSET for feeds); define idempotency key; specify ordering and delivery semantics; add presence/rate limits; close with observability.

三类题、一套模型

Facebook 信息流、Slack 聊天、Twilio 通知看起来不同,但都可以归结为:一个生产者事件怎样到达 N 个消费者,扩散发生在哪里?掌握扩散光谱后,你能在白板上解决任何一个。

  • 信息流——1 条帖子 → 1000 粉丝(Twitter 中位)或 1 亿粉丝(名人)。异步、容忍分钟延迟。
  • 聊天——1 条消息 → 2 人(DM)或 1 万(Slack 频道)。同步、秒级、有序、感知 presence。
  • 通知——1 个事件 → 1 个用户跨 N 个渠道(APNs、FCM、SMS、email)。发射即忘、严格幂等。

共享模型:生产者 → Pub/Sub 日志(Kafka)→ fan-out workers → 每消费者物化视图。变量是视图是什么(feed、inbox、mailbox)和新鲜度要求。

参考来源

Alex Xu V1 第 10-12 章是教科书答案;Acing SDI 第 14、16 章补充功能分区的深度。

信息流:写扩散 vs 读扩散

核心权衡:

策略写开销读开销存储新鲜度
写扩散 (push)O(followers) 写入每个粉丝时间线O(1) 读自己时间线帖子在每粉丝列表重复名人慢,读快
读扩散 (pull)O(1) 写作者帖子O(followees) 读 + 合并单份永远新鲜,读贵
混合普通 push,名人 pullO(1) + 少量 pull大部分重复两全

数据:Twitter 用户中位 200 粉丝,99 分位 1 万+,Taylor Swift 9000 万。纯 push 她每条推要 9000 万次 Redis 写——在 Twitter 峰值 30 万推/秒下可行,仅因为他们批处理并卸载到专用 fanout 层。纯 pull 对普通用户要读上千好友——合并太慢。

flowchart TB
  A[用户发帖] --> K[Kafka: post_events]
  K --> FW[Fanout Workers]
  FW -->|若作者粉丝<1万| T[(Redis: 每用户时间线 ZSET)]
  FW -->|若名人| C[(名人帖子存储)]
  U[读者打开信息流] --> API[Feed API]
  API --> T
  API --> C
  API -->|合并+排序| U

名人问题(混合扩散)

经典面试陷阱:选 push,面试官问"Taylor Swift 怎么办?"选 pull,问"5 亿 DAU 读 feed 怎么办?"

答案是 混合

  1. 定义阈值(例如粉丝 >1 万 = 名人)。
  2. 普通用户:写扩散到粉丝时间线 ZSET (ZADD timeline:{follower_id} ts post_id)。
  3. 名人:跳过扩散,写入每名人帖子列表。
  4. 读时,合并粉丝时间线与其关注的每位名人的近期帖子(通常 0–50 位)。

排序(Facebook 风格)是之上的 ML 步骤:召回 2000 候选,模型重排,分页。Chip Huyen 第 6 章讲离线/在线评估;Khang Pham 讲双塔召回架构。

反模式

用 SQL SELECT * FROM posts WHERE author IN (followees) ORDER BY ts DESC LIMIT 50 实现时间线。任何关注 500 人的用户都要读百万行。永远用预计算时间线或时间分区读。

聊天:WebSocket、Presence、投递语义

传输

  • Long polling——降级方案,开销高、额外延迟。
  • SSE——单向 push,适合通知不适合聊天。
  • WebSocket——双向、单 TCP、wss://。现代聊天都用它(Slack、Discord、Telegram web)。

单 WebSocket 服务器承载 5-10 万并发连接(调 ulimit、socket 缓冲、epoll)。Slack 全舰队数千万并发 socket。

架构

flowchart LR
  U1[用户 A socket] --> GW[WS Gateway
按 user_id 粘性] U2[用户 B socket] --> GW GW --> S[Session Store
user_id 到 gateway_host] GW --> PS[Pub/Sub: Kafka 或 Redis] PS --> GW GW --> MS[Message Service] MS --> DB[(Messages DB
按 channel_id 分区)] MS --> PR[Presence Service
Redis TTL 心跳]

Presence

Presence 看似简单其实难。朴素"更新一行" = 每心跳写放大。标准答案:Redis presence:{user_id} TTL 30s 由心跳刷新。通知好友时 fan-out 到 channel Pub/Sub。Slack 频道用 懒 presence:只有打开成员列表时才计算在线成员。

投递语义

  • 至少一次是默认。客户端按 (channel_id, msg_seq) 去重。
  • 频道内有序:Kafka 按 channel_id 分区。跨频道无序。
  • 已读:每 (user, channel) 一个 last-read 指针;O(1) 更新,latest_seq - last_read_seq 计算未读。

通知:push / SMS / 邮件扩散

通知系统接收事件(例如"订单已发货")并按用户偏好、节流、送达规则扇出到多渠道。

flowchart LR
  EV[事件源] --> IQ[摄入队列]
  IQ --> NS[通知服务]
  NS -->|偏好查询| PR[(Prefs DB)]
  NS -->|渲染模板| TP[模板服务]
  NS --> Q1[APNs 队列]
  NS --> Q2[FCM 队列]
  NS --> Q3[Twilio SMS]
  NS --> Q4[SES 邮件]
  Q1 --> A[APNs]
  Q2 --> G[FCM]
  Q4 --> E[SMTP]
  A --> DLQ1[无效 token DLQ]
  • 幂等 key = (event_id, user_id, channel)。没有它,worker 重试会凌晨 2 点给用户双发短信。
  • 每用户速率限制——每小时不超过 K 条。per-user 令牌桶。
  • DLQ 与无效 token 清理——APNs 返回 "BadDeviceToken" 必须删除该 token,否则永久漏容量。
  • 送达率——邮件域名声誉(DKIM/SPF/DMARC)、SMS 退订(STOP 合规)、APNs 静默推送。

OpenAI/Anthropic 视角

LLM 平台有内部通知系统做安全升级:内容分类器告警 → 值班呼叫 + 邮件摘要 + Slack。Anthropic 发布 Claude 安全卡,策略审核者用同样扇出架构——架构相同但审计日志更严格。

横切关注点与面试清单

共享关注

  • 有序——按定义顺序的实体分区(聊天 channel_id、信息流 user_id、通知 event_id)。
  • 幂等——生产者重试 + 消费者按 (source, seq) 去重。
  • 背压——有界队列 + 峰值削峰或缓冲。
  • 可观测性——从事件到送达的端到端延迟直方图,按渠道。

反模式

  • 用关系数据库存活跃时间线 ZSET。Redis 有序集合是 O(log N) 插入和 O(1) top-K,Postgres 会哭。
  • 每次键击广播 presence。去抖到 1 次/30s。
  • 所有聊天消息用一个 Kafka 分区。失去并行、频道互相阻塞。
  • SMS 失败永久重试。设重试上限(例如 3)进 DLQ——运营商限流很严。

白板清单:明确事件类型和消费者规模;扇出策略(写/读/混合);传输(聊天用 WS、通知用队列、feed 用 ZSET);幂等 key;有序与投递语义;presence/速率限制;可观测性收尾。