O4 · Design Slack O4 · 设计 Slack
Verified source经核实出处
Original prompt: "design slack…deliver a MVP in 2 weeks…message delivery scalability/reliability" — LeetCode, 2024-10-30. Also reported on Exponent and Glassdoor. Credibility C.
Framing the MVP correctlyMVP 砍对范围
Two-week MVP means ruthless scoping. In-scope: 1:1 & small-group chat, send/receive, fetch history, online push via WebSocket, basic auth. Out of scope: search, files, complex permissions, cross-device state sync, read receipts.两周 MVP 意味着果断砍需求。范围内:1:1 与小群聊、发送/接收、拉历史、WebSocket 在线推送、基础鉴权。范围外:搜索、文件、复杂权限、跨设备状态同步、已读回执。
Minimum architecture最小架构
flowchart LR C[Client] --> G[Gateway] G --> A[Auth] G --> M[Message Service] M --> D[(Message DB)] M --> Q[Fanout Queue] Q --> N[Notification/Push] N --> C
Semantics you MUST nail必须讲清的语义
- Delivery guarantee: at-least-once is standard; client dedups on message_id.投递保证:at-least-once 是标准;客户端按 message_id 去重。
- Ordering: per-channel monotonic
seq; no cross-channel ordering.顺序:每 channel 单调递增seq;跨 channel 无序。 - Offline: persist all messages; on reconnect, pull history since last
seq+ subscribe to stream.离线:所有消息持久化;重连时按最后seq拉历史 + 订阅流。 - Multi-device: one user multiple sessions; server tracks
last_read_seqper session.多端:单用户多 session;服务端记录每 session 的last_read_seq。
API (MVP)API(MVP)
POST /channels/{id}/messages { client_msg_id, text }
GET /channels/{id}/messages?before=&limit=
GET /ws (or /events/stream) — streaming new messages
heartbeat every 30s, cursor resume on reconnectData model数据模型
Message(channel_id, message_id, sender_id, created_at, payload, seq)
-- Index: (channel_id, seq DESC) for pagination
Channel(channel_id, type, members, next_seq)Scale & consistency扩展与一致性
- Strict per-channel ordering via single-partition writes caps throughput. Mitigation: shard by channel_id; use claim-check for attachments.严格每 channel 顺序 = 单分区写限制吞吐。缓解:按 channel_id 分片;附件用 claim-check 模式。
- Fanout-on-write (push to each user inbox) vs fanout-on-read (aggregate on read). MVP uses fanout-on-read for small groups; production mixes them for big channels.Fanout-on-write(推送到每用户 inbox) vs fanout-on-read(读时聚合)。MVP 用 fanout-on-read;生产混合策略应对大群。
Hot channel / big group热频道 / 大群
A 10,000-member channel can cause a fanout storm. Solution: push to online users in real time, offline users pull on reconnect; big channels use a layered topic.1 万人频道会触发 fanout 风暴。方案:在线用户实时推送,离线用户重连时拉取;大频道用分层 topic。
Follow-ups追问
- Exactly-once? Not in messaging — use client_msg_id dedup.Exactly-once?IM 不做;用 client_msg_id 去重。
- Edit/delete? New event (edit_event) with same message_id; client replays.编辑/撤回?新事件(edit_event)复用 message_id;客户端重放。
- Presence? Heartbeat → short TTL cache; online/offline is eventually consistent.在线状态?心跳 → 短 TTL 缓存;在线/离线最终一致。