O3 · Webhook Platform with External URL Lookup (24h Retry) O3 · 依赖外部服务的 Webhook 平台(24h 重试)
Verified source经核实出处
Original prompt: "implement a webhook platform…create a webhook request. (cxid, json blob). url is queried from serviceB. retry for 24 hours." — LeetCode, 2024-10-30, screening system design. Credibility C.
What makes this different from O1/O2与 O1/O2 的不同点
The URL is not yours — it lives in ServiceB. Two risks: (1) which URL version you bind to an event, (2) ServiceB unavailability cascading into your ingest path.URL 不在你手上——它在 ServiceB。两大风险:(1) 事件绑定哪一版 URL,(2) ServiceB 不可用向 ingest 路径级联。
Key design decisions关键设计决策
- Resolve URL at delivery time, not ingest time. Tracks ServiceB changes; cost is one extra dependency on the delivery path.在投递时查 URL,而不是 ingest 时固化。能跟随 ServiceB 变化;代价是投递路径多一次依赖。
- Short-TTL config cache (1-5 min) to reduce ServiceB load and absorb jitter. Store resolved_url + serviceb_version in each attempt for traceability.短 TTL 配置缓存(1-5 分钟)降低 ServiceB 压力、吸收抖动。每次 attempt 记录 resolved_url 与 serviceb_version 以便追溯。
- Circuit breaker on ServiceB — if degraded, delay retries, mark events
blocked_on_dependency, alert.对 ServiceB 熔断——降级时延迟重试、将事件标记为blocked_on_dependency、告警。
Architecture架构
flowchart LR A[Create Webhook Request] --> B[Ingest API] B --> Q[Queue] Q --> W[Worker] W --> S[ServiceB: Get URL] W --> T[Deliver HTTP] W --> R[Retry until 24h] W --> DLQ[(DLQ)]
Data model (cxid-keyed)数据模型(以 cxid 为键)
WebhookRequest(
cxid, request_id PK, payload_ref,
created_at, deadline_at=created_at+24h,
status, resolved_url, serviceb_version
)
Attempt(attempt_id, request_id, resolved_url, serviceb_version, http_status, ...)deadline_at is a strong constraint: every retry scheduler check must honor it so backlog doesn't cause infinite retries.deadline_at 是强约束:所有重试调度必须检查它,避免积压导致无限重试。
24-hour retry policy24 小时重试策略
- Exponential backoff + jitter; within 24h, you must cover enough attempts: e.g. 1s, 2s, 4s... capped at 10–30 min.指数退避 + 抖动;24h 内必须覆盖足够次数:1s, 2s, 4s... 到 10–30 分钟封顶。
- Error classification: DNS/timeout/5xx retry; 4xx (except 429) don't retry; 429 obey
Retry-After.错误分类:DNS/超时/5xx 可重试;4xx(除 429)不重试;429 遵守Retry-After。 - After deadline: DLQ + tenant notification + manual replay button.超过 deadline:DLQ + 通知租户 + 手动回放。
Follow-ups高频追问
- What if ServiceB is down? Worker uses cache + circuit breaker; re-queue with backoff; events marked blocked_on_dependency for capacity protection.ServiceB 挂了怎么办?Worker 使用缓存 + 熔断;带退避重入队;事件标记为 blocked_on_dependency 以保护容量。
- Idempotency across cxid? Unique key is
(cxid, request_id); attempts are append-only.cxid 幂等如何做?唯一键(cxid, request_id);attempts 追加写。