O2 · Design a Webhook Service (REST API) O2 · 设计 Webhook 服务(REST API)
Verified source经核实出处
Original prompt: "System design Webhook service…caching, db design, focus on failure and retry mechanism in message queue. Implement the REST service with JSON body and query for GET and POST." — TeamBlind, 2024-04-10, screening round. Credibility C.
Requirements clarification需求澄清
Unlike O1 (internal platform), this framing wants a product API. Make GET/POST resource semantics explicit: resource paths, filtering, pagination. Confirm whether delivery is at-least-once (almost always yes) and the max retry window.与 O1 不同,这题把 webhook 当作产品 API。要讲清 GET/POST 的资源语义:路径、过滤、分页。确认投递是否 at-least-once(基本都是)与最大重试窗口。
High-level architecture高层架构
Split read path (config lookups, dashboard) from write path (delivery). Cache the read-heavy config layer, but avoid caching delivery results unless a clear hot-read pattern exists.读路径(配置查询、仪表盘)与写路径(投递)分离。对读多的 config 层做缓存;除非有明显的热读,否则不要缓存投递结果。
flowchart LR U[API Caller] --> G[REST API] G --> C[Config Cache] C -->|miss| D[(Config DB)] G --> Q[Delivery Queue] Q --> W[Workers] W --> T[Target URL] W --> L[(Delivery Log)]
MVP resource modelMVP 资源模型
WebhookSubscription(subscription_id, tenant_id, event_type, endpoint_id)
WebhookDelivery(delivery_id, subscription_id, event_id, status, last_attempt)
WebhookAttempt(attempt_id, delivery_id, status_code, latency_ms, timestamp)APIAPI 设计
POST /subscriptions { event_type, endpoint_id }
POST /deliveries { subscription_id, payload, idempotency_key } → { delivery_id }
GET /deliveries?subscription_id=&status=&cursor=
GET /deliveries/{delivery_id}/attemptsHow to talk about cache like an engineer缓存该怎么讲才像真做过
- Cache endpoint/subscription config: read-heavy, cache-aside, write-DB-then-invalidate. Alternative write-through adds latency and is rarely worth it.缓存 endpoint/subscription 配置:读多,cache-aside,写 DB 后失效缓存。Write-through 方案会增加延迟,不划算。
- Don't cache delivery results except for well-known hot debug pages, and then only with short TTL + singleflight.不要缓存投递结果,除非明显热读(调试页),且必须加短 TTL + singleflight。
- Cache stampede mitigation: random TTL, singleflight, per-key rate limit. Mention these or you'll be asked.缓存雪崩/击穿:随机 TTL、singleflight、按 key 限流。主动提及,否则必被追问。
Consistency一致性
- Config updates are eventually consistent (cache may be briefly stale). But disable_endpoint must take effect fast — push to workers with a version number.配置更新最终一致(缓存可能短暂旧值)。但「禁用 endpoint」必须快速生效——给 worker 推送带版本号的禁用信号。
- Delivery state uses append-only attempt log + materialized view to avoid write amplification.delivery 状态用追加式 attempt 日志 + 物化视图,避免写放大。
Common follow-ups高频追问
- Cache invalidation race with concurrent writes? Version key (
etag/updated_at), invalidate after DB write, read-through correction.缓存失效与写竞态?版本号(etag/updated_at),写后失效,必要时读修正。 - How do you implement delay queues? Either a delayed topic or a
scheduled_atcolumn with a worker that pulls due items.延迟队列如何实现?延迟 topic 或scheduled_at字段 + worker 拉取到期项。 - How to paginate the attempts list? Cursor-based (
cursor=last_attempt_id); offset pagination breaks under high write-rate.attempt 分页如何做?游标分页(cursor=last_attempt_id);offset 在高写入速率下会错位。