OpenAI ★★★ Frequent Hard RESTCachingQueue

O2 · Design a Webhook Service (REST API) O2 · 设计 Webhook 服务（REST API）

Verified source经核实出处

Original prompt: "System design Webhook service…caching, db design, focus on failure and retry mechanism in message queue. Implement the REST service with JSON body and query for GET and POST." — TeamBlind, 2024-04-10, screening round. Credibility C.

Requirements clarification需求澄清

Unlike O1 (internal platform), this framing wants a product API. Make GET/POST resource semantics explicit: resource paths, filtering, pagination. Confirm whether delivery is at-least-once (almost always yes) and the max retry window.与 O1 不同，这题把 webhook 当作产品 API。要讲清 GET/POST 的资源语义：路径、过滤、分页。确认投递是否 at-least-once（基本都是）与最大重试窗口。

High-level architecture高层架构

Split read path (config lookups, dashboard) from write path (delivery). Cache the read-heavy config layer, but avoid caching delivery results unless a clear hot-read pattern exists.读路径（配置查询、仪表盘）与写路径（投递）分离。对读多的 config 层做缓存；除非有明显的热读，否则不要缓存投递结果。

flowchart LR
  U[API Caller] --> G[REST API]
  G --> C[Config Cache]
  C -->|miss| D[(Config DB)]
  G --> Q[Delivery Queue]
  Q --> W[Workers]
  W --> T[Target URL]
  W --> L[(Delivery Log)]

MVP resource modelMVP 资源模型

WebhookSubscription(subscription_id, tenant_id, event_type, endpoint_id)
WebhookDelivery(delivery_id, subscription_id, event_id, status, last_attempt)
WebhookAttempt(attempt_id, delivery_id, status_code, latency_ms, timestamp)

APIAPI 设计

POST /subscriptions  { event_type, endpoint_id }
POST /deliveries     { subscription_id, payload, idempotency_key } → { delivery_id }
GET  /deliveries?subscription_id=&status=&cursor=
GET  /deliveries/{delivery_id}/attempts

How to talk about cache like an engineer缓存该怎么讲才像真做过

Cache endpoint/subscription config: read-heavy, cache-aside, write-DB-then-invalidate. Alternative write-through adds latency and is rarely worth it.缓存 endpoint/subscription 配置：读多，cache-aside，写 DB 后失效缓存。Write-through 方案会增加延迟，不划算。
Don't cache delivery results except for well-known hot debug pages, and then only with short TTL + singleflight.不要缓存投递结果，除非明显热读（调试页），且必须加短 TTL + singleflight。
Cache stampede mitigation: random TTL, singleflight, per-key rate limit. Mention these or you'll be asked.缓存雪崩/击穿：随机 TTL、singleflight、按 key 限流。主动提及，否则必被追问。

Consistency一致性

Config updates are eventually consistent (cache may be briefly stale). But disable_endpoint must take effect fast — push to workers with a version number.配置更新最终一致（缓存可能短暂旧值）。但「禁用 endpoint」必须快速生效——给 worker 推送带版本号的禁用信号。
Delivery state uses append-only attempt log + materialized view to avoid write amplification.delivery 状态用追加式 attempt 日志 + 物化视图，避免写放大。

Common follow-ups高频追问

Cache invalidation race with concurrent writes? Version key (etag/updated_at), invalidate after DB write, read-through correction.缓存失效与写竞态？版本号（etag/updated_at），写后失效，必要时读修正。
How do you implement delay queues? Either a delayed topic or a scheduled_at column with a worker that pulls due items.延迟队列如何实现？延迟 topic 或 scheduled_at 字段 + worker 拉取到期项。
How to paginate the attempts list? Cursor-based (cursor=last_attempt_id); offset pagination breaks under high write-rate.attempt 分页如何做？游标分页（cursor=last_attempt_id）；offset 在高写入速率下会错位。