OpenAI ★★★ Frequent Medium Base62RedirectHot Cache

O20 · Design a URL Shortener (Shorten URL) O20 · 设计短链接系统

Verified source经核实出处

Reported on 小红书 by @Infra+RL+Robots (Recruiter) as one of OpenAI's five SD-pool classics — candidates are told to prepare all five rather than bet on one. Candidate in that post had prepared the CI/CD track and got hit with this instead. Credibility B (recruiter-attributed, 2026).来源:小红书 @Infra+RL+Robots 招聘官,明确点名为 OpenAI 的五大 SD 题池之一——建议全部准备而不是赌一道。帖子里候选人重点准备 CI/CD,结果考了这题。可信度 B(招聘官明示,2026)。

Cross-referenced: Alex Xu V1 Ch.8, ByteByteGo "Design a URL Shortener", IGotAnOffer.交叉参考:Alex Xu V1 第 8 章、ByteByteGo「短链设计」、IGotAnOffer。

Functional requirements功能需求

  • POST /shorten {url} → returns 7-char short code.POST /shorten {url} → 返回 7 位短码。
  • GET /{code} → 301/302 redirect to original URL.GET /{code} → 301/302 跳转原 URL。
  • Optional: custom alias, expiration, per-user analytics.可选:自定义别名、过期时间、按用户统计。

Back-of-envelope容量估算

Assume 100M new URLs/day → ~1.2K write QPS; read/write = 100:1 → 120K read QPS (peak ~500K). 10 years retention → 365 B URLs. With 7-char Base62 you get 62⁷ ≈ 3.5 T addresses → safe.假设每天 1 亿新链接 → 写 QPS ~1.2K;读写比 100:1 → 读 QPS 12 万(峰值 ~50 万)。保留 10 年 → 3650 亿链接。7 位 Base62 = 62⁷ ≈ 3.5 万亿地址 → 充裕。

Short-code generation — three options短码生成的三种方案

OptionHowTrade-off
Hash-of-URL (MD5/SHA1 → base62[0:7]) Deterministic; same URL → same code.确定性;同 URL → 同短码。 Collisions need resolution (rehash + probe). Hard to customize.冲突需处理(rehash + 探测)。难以自定义。
Auto-increment ID → Base62 encode Single counter or sharded counters (Snowflake-like).单计数器或分片计数器(Snowflake 风格)。 Easy & no collisions; but IDs are guessable (enumeration attack) and hot-shard risk.简单无冲突;但 ID 可枚举(攻击面)、热分片风险。
Pre-generated code pool Offline job fills a "free codes" queue; workers POP to assign.离线任务填充"空闲短码"队列;worker 从中 POP 分配。 Decouples gen from assign, trivial to scale, but requires pool monitoring.生成与分配解耦,扩展简单,但需监控池水位。

Recommended: sharded counter + Base62 for simplicity; add per-user custom-alias path that goes through a separate uniqueness check.推荐:分片计数器 + Base62,简单易扩展;自定义别名走独立唯一性校验路径。

Architecture架构

flowchart LR
  U[Client] --> LB[Load Balancer]
  LB --> API[Shortener API]
  API --> IDG[ID Generator
sharded counter] API --> DB[(KV Store: code → url)] API --> CACHE[Redis hot-link cache] U2[Reader] --> LB2[LB + CDN] LB2 --> R[Redirect Service] R --> CACHE R --> DB R -.async.-> AN[Analytics pipeline
Kafka → ClickHouse]

Data model数据模型

url_map:
  code        VARCHAR(7)  PRIMARY KEY
  long_url    TEXT
  user_id     UUID        -- NULL for anonymous
  created_at  TIMESTAMP
  expires_at  TIMESTAMP   -- NULL = never

Index on (user_id, created_at) for user dashboards.

Read path: cache hierarchy读路径:缓存层次

  1. CDN edge cache for the 301 itself (TTL = min(expires_at - now, 10m)).CDN 边缘缓存 301 响应(TTL = min(expires_at − now, 10 分钟))。
  2. Redis LRU for hot codes (>80 % hit-rate target).Redis LRU 缓存热门短码(目标命中率 >80 %)。
  3. KV store (Cassandra / DynamoDB) as authority.KV 存储(Cassandra / DynamoDB)作为权威。

Follow-ups追问

  • 301 vs 302? 302 if you want per-click analytics (browser re-hits server). 301 is cacheable by CDN → faster but loses tracking.301 还是 302?要做点击分析就用 302(浏览器每次回源)。301 可被 CDN 缓存 → 更快但失去跟踪。
  • Expired / deleted? Soft-delete with expires_at; GC job. Serve 410 Gone.过期/删除?expires_at 软删除;GC 作业清理。返回 410 Gone。
  • Abuse / malware URLs? Async Safe-Browsing lookup at creation, block known-bad; periodic re-scan.滥用 / 恶意链接?创建时异步查 Safe-Browsing,阻止已知恶意;周期性重扫。
  • Custom alias race? Use DB unique constraint; retry on conflict with friendly error.自定义别名竞争?DB 唯一约束;冲突时重试并给出友好错误。
  • Analytics without hot-row DB writes? Fire-and-forget Kafka event; ClickHouse for aggregation.不在热路径写 DB 做统计?发送 Kafka 事件即走;ClickHouse 聚合。

Related study-guide topics相关学习手册专题