OpenAI ★★★ Frequent Hard WebSocketFan-outChat

O4 · Design Slack O4 · 设计 Slack

Verified source经核实出处

Original prompt: "design slack…deliver a MVP in 2 weeks…message delivery scalability/reliability" — LeetCode, 2024-10-30. Also reported on Exponent and Glassdoor. Credibility C.

Framing the MVP correctlyMVP 砍对范围

Two-week MVP means ruthless scoping. In-scope: 1:1 & small-group chat, send/receive, fetch history, online push via WebSocket, basic auth. Out of scope: search, files, complex permissions, cross-device state sync, read receipts.两周 MVP 意味着果断砍需求。范围内:1:1 与小群聊、发送/接收、拉历史、WebSocket 在线推送、基础鉴权。范围外:搜索、文件、复杂权限、跨设备状态同步、已读回执。

Minimum architecture最小架构

flowchart LR
  C[Client] --> G[Gateway]
  G --> A[Auth]
  G --> M[Message Service]
  M --> D[(Message DB)]
  M --> Q[Fanout Queue]
  Q --> N[Notification/Push]
  N --> C

Semantics you MUST nail必须讲清的语义

  • Delivery guarantee: at-least-once is standard; client dedups on message_id.投递保证:at-least-once 是标准;客户端按 message_id 去重。
  • Ordering: per-channel monotonic seq; no cross-channel ordering.顺序:每 channel 单调递增 seq;跨 channel 无序。
  • Offline: persist all messages; on reconnect, pull history since last seq + subscribe to stream.离线:所有消息持久化;重连时按最后 seq 拉历史 + 订阅流。
  • Multi-device: one user multiple sessions; server tracks last_read_seq per session.多端:单用户多 session;服务端记录每 session 的 last_read_seq

API (MVP)API(MVP)

POST /channels/{id}/messages  { client_msg_id, text }
GET  /channels/{id}/messages?before=&limit=
GET  /ws  (or /events/stream)  — streaming new messages
                                heartbeat every 30s, cursor resume on reconnect

Data model数据模型

Message(channel_id, message_id, sender_id, created_at, payload, seq)
-- Index: (channel_id, seq DESC) for pagination
Channel(channel_id, type, members, next_seq)

Scale & consistency扩展与一致性

  • Strict per-channel ordering via single-partition writes caps throughput. Mitigation: shard by channel_id; use claim-check for attachments.严格每 channel 顺序 = 单分区写限制吞吐。缓解:按 channel_id 分片;附件用 claim-check 模式。
  • Fanout-on-write (push to each user inbox) vs fanout-on-read (aggregate on read). MVP uses fanout-on-read for small groups; production mixes them for big channels.Fanout-on-write(推送到每用户 inbox) vs fanout-on-read(读时聚合)。MVP 用 fanout-on-read;生产混合策略应对大群。

Hot channel / big group热频道 / 大群

A 10,000-member channel can cause a fanout storm. Solution: push to online users in real time, offline users pull on reconnect; big channels use a layered topic.1 万人频道会触发 fanout 风暴。方案:在线用户实时推送,离线用户重连时拉取;大频道用分层 topic。

Follow-ups追问

  1. Exactly-once? Not in messaging — use client_msg_id dedup.Exactly-once?IM 不做;用 client_msg_id 去重。
  2. Edit/delete? New event (edit_event) with same message_id; client replays.编辑/撤回?新事件(edit_event)复用 message_id;客户端重放。
  3. Presence? Heartbeat → short TTL cache; online/offline is eventually consistent.在线状态?心跳 → 短 TTL 缓存;在线/离线最终一致。

Related study-guide topics相关学习手册专题