O36 · Design a Distributed Log / Trace Pipeline O36 · 设计分布式日志 / 链路追踪流水线
Verified source经核实出处
Classic observability problem adapted at OpenAI onsites (一亩三分地, 2025). Credibility B.
Architecture架构
flowchart LR APP[Apps / agents] --> COL[OTEL Collector] COL --> BUS[(Kafka)] BUS --> HOT[(Hot store - 7d)] BUS --> COLD[(Cold S3/Parquet - 1y)] HOT --> SEARCH[Search UI] COLD --> LAKE[Lakehouse queries] BUS --> SAMP[Adaptive sampler]
Key decisions关键决策
- **Tail-based sampling** for traces: keep errors/slow, drop normal; dramatic cost reduction.**Trace tail-based 采样**:保留错误/慢请求,丢正常;成本大降。
- **Tiered retention**: 7d hot (search), 1y cold (Parquet on object store).**分级保留**:7 天热(搜索)、1 年冷(对象存储 Parquet)。
- **Schema-aware columnar storage** for low-cost slice-dice (GPU_id, model_id).**结构化列存**低成本切片。
- **Cardinality guards**: enforce per-label unique-value cap; reject metrics with user_id in labels.**基数守护**:每 label 唯一值上限;禁止 user_id 进标签。
Follow-ups追问
- Semantics? at-least-once + idempotent sinks.语义?at-least-once + 幂等 sink。
- Query latency on 1y data? partitioned Parquet + predicate pushdown.1 年数据查询?Parquet 分区 + 谓词下推。