OpenAI ★★ Frequent Medium GatewayMulti-RegionFailover

O38 · Design a Multi-Region API Gateway O38 · 设计多区域 API 网关

Verified source经核实出处

Classic SD question extended at OpenAI onsites. Credibility B.

Architecture架构

flowchart LR
  Client --> DNS[GeoDNS / Anycast]
  DNS --> EDGE[Edge PoP - TLS, auth, rate]
  EDGE --> RG1[Region US]
  EDGE --> RG2[Region EU]
  EDGE --> RG3[Region APAC]
  EDGE --> CFG[(Config plane)]

Key decisions关键决策

  • **Anycast + Edge TLS**: user hits nearest PoP; gateway terminates TLS, forwards to nearest region.**Anycast + Edge TLS**:最近 PoP 终结 TLS,再转发到最近区域。
  • **Auth at edge** with short-lived JWT; JWT carries org_id to drive routing + rate-limit shard key.**边缘鉴权**:短期 JWT 携带 org_id,作为路由/限流 shard key。
  • **Rate limit: token bucket in Redis** per-region + async global reconciliation.**限流:区域级 Redis token bucket** + 全局异步对账。
  • **Automated failover**: health probes; DNS TTL 30s; fail-open reads, fail-closed writes.**自动切换**:健康探测;DNS TTL 30s;读 fail-open、写 fail-closed。

Follow-ups追问

  • Data residency? EU requests pinned via routing header + policy.数据驻留?通过路由头 + 策略将 EU 请求固定。
  • Risky changes? canary at 1% of one region, weighted rollout.风险变更?单区 1% 金丝雀,按权重放量。

Related study-guide topics相关学习手册专题