Three questions, one mental model
Facebook news feed, Slack chat, and a Twilio-style notification service look different, but all three reduce to: how does a producer event reach N consumers, and where does the fan-out live? Master the fan-out spectrum and you can answer any of them on a whiteboard.
- News feed — 1 author post → 1,000 followers (Twitter median) or 100M followers (celebrity). Asynchronous, minutes-tolerant.
- Chat — 1 message → 2 people (DM) or 10k (Slack channel). Synchronous, seconds-tolerant, ordered, presence-sensitive.
- Notifications — 1 event → 1 user across N channels (APNs, FCM, SMS, email). Fire-and-forget, minutes-tolerant, strict idempotency.
The shared mental model: producer → pub/sub log (Kafka) → fan-out workers → per-consumer materialized view. The variable is what the materialized view is (feed, inbox, mailbox) and how fresh it must be.
Source cross-reference
Alex Xu V1 Ch.10 (notifications), Ch.11 (news feed), Ch.12 (chat) are the textbook answers. Acing SDI Ch.14 (messaging) and Ch.16 (news feed) add functional-partitioning depth.
News feed: fan-out on write vs read
The core trade-off:
| Strategy | Write cost | Read cost | Storage | Freshness |
|---|---|---|---|---|
| Fan-out on write (push) | O(followers) writes to follower timelines | O(1) read from your timeline | Duplicate post in each follower's list | Slow for celebrities, fast reads |
| Fan-out on read (pull) | O(1) write to author's posts | O(followees) reads + merge | Single copy | Always fresh, expensive reads |
| Hybrid | Push for normal users, pull for celebrities | O(1) + a few pulls | Mostly duplicated | Best of both |
Numbers: Twitter's median user has ~200 followers; the 99th percentile has 10k+; Taylor Swift has 90M. A pure push model for her generates 90M Redis writes per tweet — fine at Twitter's write volume (~300k tweets/s at peak) only because they batch and offload to a dedicated "fanout" tier. A pure pull model would require reading from thousands of friends for a typical user — the merge is too slow.
flowchart TB A[User posts] --> K[Kafka: post_events] K --> FW[Fanout Workers] FW -->|if author.followers<10k| T[(Redis: per-user timeline ZSET)] FW -->|if celebrity| C[(Celebrity posts store)] U[Viewer opens feed] --> API[Feed API] API --> T API --> C API -->|merge+rank| U
The celebrity problem (hybrid fan-out)
The classic interview trap: if you pick push, the hiring manager asks "what about Taylor Swift?" If you pick pull, they ask "what about 500M DAU reading their feed?"
The answer is hybrid:
- Define a threshold (e.g., >10k followers = celebrity).
- Normal users: fan-out on write into follower timeline ZSETs (
ZADD timeline:{follower_id} ts post_id). - Celebrities: skip fan-out; writers emit to a per-celebrity post list.
- On read, merge the follower's timeline ZSET with the recent posts of each celebrity they follow (usually 0–50 celebs).
Ranking (for Facebook-style feeds) is a separate ML step on top: retrieve 2000 candidates, rerank with a model, paginate. Chip Huyen Ch.6 discusses the offline/online evaluation of ranking. Khang Pham covers the two-tower retrieval architecture.
Anti-pattern
Materializing timelines with SQL SELECT * FROM posts WHERE author IN (followees) ORDER BY ts DESC LIMIT 50. This reads millions of rows for any user who follows 500 people. Always use a precomputed timeline or a time-partitioned read.
Chat: WebSocket, presence, delivery semantics
Transport
- Long polling — fallback; high overhead, extra latency.
- Server-Sent Events (SSE) — one-way push; good for notifications, not chat.
- WebSocket — bidirectional, single TCP connection,
wss://. Every modern chat uses it (Slack, Discord, Telegram web).
A single WebSocket server holds ~50k–100k concurrent connections (tuning ulimit, kernel socket buffers, and epoll). Slack holds tens of millions of concurrent sockets across a fleet.
Architecture
flowchart LR U1[User A socket] --> GW[WS Gateway
sticky by user_id] U2[User B socket] --> GW GW --> S[Session Store
user_id to gateway_host] GW --> PS[Pub/Sub: Kafka or Redis] PS --> GW GW --> MS[Message Service] MS --> DB[(Messages DB
partition by channel_id)] MS --> PR[Presence Service
Redis TTL heartbeat]
Presence
Presence (online/offline/away) is deceptively hard. Naive "update a row" = write amplification on every heartbeat. Standard answer: Redis key presence:{user_id} with TTL 30s, refreshed by heartbeat. To notify friends, fan-out presence changes into channel Pub/Sub. Slack channels use lazy presence: only compute online members when someone opens the member list.
Delivery semantics
- At-least-once is the default. Client dedupes by
(channel_id, msg_seq). - Ordered within a channel: partition Kafka by
channel_id; messages in one partition are ordered. Across channels, no order guarantee. - Read receipts: last-read pointer per (user, channel); O(1) updates, O(1) unread count via
latest_seq - last_read_seq.
Notifications: push / SMS / email fan-out
A notification system accepts an event (e.g., "order shipped") and fans it out across channels subject to user preferences, throttling, and deliverability rules.
flowchart LR EV[Event Sources] --> IQ[Ingest Queue] IQ --> NS[Notification Service] NS -->|pref lookup| PR[(Prefs DB)] NS -->|render template| TP[Template Service] NS --> Q1[APNs queue] NS --> Q2[FCM queue] NS --> Q3[Twilio SMS] NS --> Q4[SES email] Q1 --> A[APNs] Q2 --> G[FCM] Q3 --> T[Carrier] Q4 --> E[SMTP] A --> DLQ1[DLQ on invalid token]
- Idempotency key =
(event_id, user_id, channel). Without it, a worker retry double-texts the user at 2am. - Rate limits per user — no more than K notifications per hour to avoid spam fatigue. Token bucket per user.
- DLQ and invalid-token cleanup — APNs returns "BadDeviceToken"; you must delete that token or you'll leak capacity forever.
- Deliverability — email domain reputation (DKIM/SPF/DMARC), SMS opt-out (STOP keyword compliance), APNs silent pushes.
OpenAI/Anthropic angle
LLM platforms have internal notification systems for safety escalations: content classifier flags → on-call paging + email digest + Slack. Anthropic publishes Claude safety cards and uses similar fan-out for policy reviewers — the same architecture as any consumer notification system, but with stricter audit logging.
Cross-cutting concerns and interview checklist
Shared concerns
- Ordering — partition by the entity that defines order (channel_id for chat, user_id for feed, event_id for notifications).
- Idempotency — producer retry + consumer dedup via (source, seq).
- Backpressure — bounded queues + shed or buffer on spike.
- Observability — end-to-end latency histograms from event to delivered, per channel.
Anti-patterns
- Using a relational DB for the active timeline ZSET. Redis sorted sets give you O(log N) inserts and O(1) top-K reads — Postgres will cry.
- Broadcasting presence on every keystroke. Debounce to 1 update / 30s.
- One Kafka partition for all chat messages. You lose parallelism; channels block each other.
- Retrying failed SMS forever. Set a retry cap (e.g., 3) and move to DLQ — carriers rate-limit aggressively.
Whiteboard checklist: clarify event type and consumer count; pick fan-out strategy (write/read/hybrid); choose transport (WS for chat, queue for notifications, ZSET for feeds); define idempotency key; specify ordering and delivery semantics; add presence/rate limits; close with observability.
三类题、一套模型
Facebook 信息流、Slack 聊天、Twilio 通知看起来不同,但都可以归结为:一个生产者事件怎样到达 N 个消费者,扩散发生在哪里?掌握扩散光谱后,你能在白板上解决任何一个。
- 信息流——1 条帖子 → 1000 粉丝(Twitter 中位)或 1 亿粉丝(名人)。异步、容忍分钟延迟。
- 聊天——1 条消息 → 2 人(DM)或 1 万(Slack 频道)。同步、秒级、有序、感知 presence。
- 通知——1 个事件 → 1 个用户跨 N 个渠道(APNs、FCM、SMS、email)。发射即忘、严格幂等。
共享模型:生产者 → Pub/Sub 日志(Kafka)→ fan-out workers → 每消费者物化视图。变量是视图是什么(feed、inbox、mailbox)和新鲜度要求。
参考来源
Alex Xu V1 第 10-12 章是教科书答案;Acing SDI 第 14、16 章补充功能分区的深度。
信息流:写扩散 vs 读扩散
核心权衡:
| 策略 | 写开销 | 读开销 | 存储 | 新鲜度 |
|---|---|---|---|---|
| 写扩散 (push) | O(followers) 写入每个粉丝时间线 | O(1) 读自己时间线 | 帖子在每粉丝列表重复 | 名人慢,读快 |
| 读扩散 (pull) | O(1) 写作者帖子 | O(followees) 读 + 合并 | 单份 | 永远新鲜,读贵 |
| 混合 | 普通 push,名人 pull | O(1) + 少量 pull | 大部分重复 | 两全 |
数据:Twitter 用户中位 200 粉丝,99 分位 1 万+,Taylor Swift 9000 万。纯 push 她每条推要 9000 万次 Redis 写——在 Twitter 峰值 30 万推/秒下可行,仅因为他们批处理并卸载到专用 fanout 层。纯 pull 对普通用户要读上千好友——合并太慢。
flowchart TB A[用户发帖] --> K[Kafka: post_events] K --> FW[Fanout Workers] FW -->|若作者粉丝<1万| T[(Redis: 每用户时间线 ZSET)] FW -->|若名人| C[(名人帖子存储)] U[读者打开信息流] --> API[Feed API] API --> T API --> C API -->|合并+排序| U
名人问题(混合扩散)
经典面试陷阱:选 push,面试官问"Taylor Swift 怎么办?"选 pull,问"5 亿 DAU 读 feed 怎么办?"
答案是 混合:
- 定义阈值(例如粉丝 >1 万 = 名人)。
- 普通用户:写扩散到粉丝时间线 ZSET (
ZADD timeline:{follower_id} ts post_id)。 - 名人:跳过扩散,写入每名人帖子列表。
- 读时,合并粉丝时间线与其关注的每位名人的近期帖子(通常 0–50 位)。
排序(Facebook 风格)是之上的 ML 步骤:召回 2000 候选,模型重排,分页。Chip Huyen 第 6 章讲离线/在线评估;Khang Pham 讲双塔召回架构。
反模式
用 SQL SELECT * FROM posts WHERE author IN (followees) ORDER BY ts DESC LIMIT 50 实现时间线。任何关注 500 人的用户都要读百万行。永远用预计算时间线或时间分区读。
聊天:WebSocket、Presence、投递语义
传输
- Long polling——降级方案,开销高、额外延迟。
- SSE——单向 push,适合通知不适合聊天。
- WebSocket——双向、单 TCP、
wss://。现代聊天都用它(Slack、Discord、Telegram web)。
单 WebSocket 服务器承载 5-10 万并发连接(调 ulimit、socket 缓冲、epoll)。Slack 全舰队数千万并发 socket。
架构
flowchart LR U1[用户 A socket] --> GW[WS Gateway
按 user_id 粘性] U2[用户 B socket] --> GW GW --> S[Session Store
user_id 到 gateway_host] GW --> PS[Pub/Sub: Kafka 或 Redis] PS --> GW GW --> MS[Message Service] MS --> DB[(Messages DB
按 channel_id 分区)] MS --> PR[Presence Service
Redis TTL 心跳]
Presence
Presence 看似简单其实难。朴素"更新一行" = 每心跳写放大。标准答案:Redis presence:{user_id} TTL 30s 由心跳刷新。通知好友时 fan-out 到 channel Pub/Sub。Slack 频道用 懒 presence:只有打开成员列表时才计算在线成员。
投递语义
- 至少一次是默认。客户端按
(channel_id, msg_seq)去重。 - 频道内有序:Kafka 按
channel_id分区。跨频道无序。 - 已读:每 (user, channel) 一个 last-read 指针;O(1) 更新,
latest_seq - last_read_seq计算未读。
通知:push / SMS / 邮件扩散
通知系统接收事件(例如"订单已发货")并按用户偏好、节流、送达规则扇出到多渠道。
flowchart LR EV[事件源] --> IQ[摄入队列] IQ --> NS[通知服务] NS -->|偏好查询| PR[(Prefs DB)] NS -->|渲染模板| TP[模板服务] NS --> Q1[APNs 队列] NS --> Q2[FCM 队列] NS --> Q3[Twilio SMS] NS --> Q4[SES 邮件] Q1 --> A[APNs] Q2 --> G[FCM] Q4 --> E[SMTP] A --> DLQ1[无效 token DLQ]
- 幂等 key =
(event_id, user_id, channel)。没有它,worker 重试会凌晨 2 点给用户双发短信。 - 每用户速率限制——每小时不超过 K 条。per-user 令牌桶。
- DLQ 与无效 token 清理——APNs 返回 "BadDeviceToken" 必须删除该 token,否则永久漏容量。
- 送达率——邮件域名声誉(DKIM/SPF/DMARC)、SMS 退订(STOP 合规)、APNs 静默推送。
OpenAI/Anthropic 视角
LLM 平台有内部通知系统做安全升级:内容分类器告警 → 值班呼叫 + 邮件摘要 + Slack。Anthropic 发布 Claude 安全卡,策略审核者用同样扇出架构——架构相同但审计日志更严格。
横切关注点与面试清单
共享关注
- 有序——按定义顺序的实体分区(聊天 channel_id、信息流 user_id、通知 event_id)。
- 幂等——生产者重试 + 消费者按 (source, seq) 去重。
- 背压——有界队列 + 峰值削峰或缓冲。
- 可观测性——从事件到送达的端到端延迟直方图,按渠道。
反模式
- 用关系数据库存活跃时间线 ZSET。Redis 有序集合是 O(log N) 插入和 O(1) top-K,Postgres 会哭。
- 每次键击广播 presence。去抖到 1 次/30s。
- 所有聊天消息用一个 Kafka 分区。失去并行、频道互相阻塞。
- SMS 失败永久重试。设重试上限(例如 3)进 DLQ——运营商限流很严。
白板清单:明确事件类型和消费者规模;扇出策略(写/读/混合);传输(聊天用 WS、通知用队列、feed 用 ZSET);幂等 key;有序与投递语义;presence/速率限制;可观测性收尾。