OpenAI ★★ Frequent Hard Lexer/ParserASTSandbox

O22 · Design a Toy Language Interpreter O22 · 设计自定义语言解释器

Verified source经核实出处

Reported on 小红书 @Infra+RL+Robots as one of OpenAI's 5 SD-pool questions (2026). Fits OpenAI's taste for "real engineering" SD — not a distributed system but a language runtime you'd ship inside Code Interpreter / tool-use sandboxes. Credibility B.来自小红书 @Infra+RL+Robots（2026），OpenAI 五大 SD 题池之一。符合 OpenAI 偏好「工程实感」的 SD 风格——不是分布式系统而是 Code Interpreter / 工具沙箱可能用到的语言运行时。可信度 B。

Pipeline执行流水线

flowchart LR
  S[Source] --> L[Lexer
tokens]
  L --> P[Parser
AST]
  P --> SA[Semantic Analyzer
resolve + type-check]
  SA --> E[Evaluator
tree walker]
  E --> R[Runtime
heap, env, stdlib]
  R --> O[Output / stdio]
  E -.quota.-> SB[Sandbox limits
steps, mem, time]

Language surface (scoped for a 60-min whiteboard)语言特性（按 60 分钟白板量裁剪）

Values: int, float, bool, string, list, map, function.值：int、float、bool、string、list、map、function。
Syntax: let x = ..., if/else, while, fn name(args) { ... }, first-class functions.语法：let x = ...、if/else、while、fn name(args) { ... }、一等函数。
Scoping: lexical; nested environments (parent-pointer trees).作用域：词法；嵌套 env（父指针树）。
Stdlib: print, len, range, JSON round-trip. Everything else is a tool call.标准库：print、len、range、JSON 往返。其他全部走工具调用。

Parser choice语法分析选型

Option	When
Recursive descent	Best for whiteboard; hand-written; easy debugging.白板首选；手写；调试友好。
Pratt parsing	Clean operator precedence; pairs with recursive descent.优雅处理运算符优先级；与递归下降配合。
Parser generator (ANTLR, yacc)	Mention for production; don't write one live.生产环境可提；白板不现场写。

Evaluator: tree-walk vs bytecode求值：树遍历 vs 字节码

Start with tree-walking: eval(node, env) dispatches on AST type. Simple, debuggable, ~1-2× slower than bytecode. Mention bytecode VM (stack machine, constant pool, instruction decode loop) as the natural upgrade when perf matters.先用树遍历：eval(node, env) 按 AST 类型分发。简单、易调试、比字节码慢 1-2 倍。提及字节码 VM（栈机器、常量池、指令解码循环）作为性能升级方向。

Sandbox & safety (OpenAI's real reason to ask this)沙箱与安全（OpenAI 真正在乎的）

Step budget: increment counter per eval step; throw OutOfSteps after N.步数预算：每步自增计数器；超 N 抛 OutOfSteps。
Memory cap: custom heap with allocated-byte tracking; refuse allocations past ceiling.内存上限：自定义堆跟踪已分配字节；超上限拒绝分配。
Wall-clock watchdog: separate thread cancels execution past deadline.墙钟看门狗：独立线程到期取消执行。
No host escape: no raw FS, network, or exec — only explicit stdlib + tool calls.无宿主逃逸：无裸 FS、网络、exec —— 只暴露标准库 + 工具调用。
Deterministic mode: fix PRNG seed; freeze time; makes testing and judging reproducible.确定性模式：固定 PRNG 种子；冻结时间；测试与判题可复现。

Error UX错误提示

Token / node carries (line, col, span); errors render a caret-underlined snippet.Token / 节点带 (line, col, span)；错误渲染带下划线的片段。
Separate SyntaxError, NameError, TypeError, RuntimeError.区分 SyntaxError、NameError、TypeError、RuntimeError。

Follow-ups追问

Add closures? Capture surrounding environment by reference in the Function value.加闭包？在 Function 值中按引用捕获外层 env。
Add types? HM type inference for let/fn; escape hatch with any.加类型？let/fn 用 HM 类型推导；用 any 做逃逸口。
REPL vs script? REPL keeps one long-lived env; script starts fresh. Both wrapped in same sandbox.REPL vs 脚本？REPL 保持长 env；脚本每次新 env。都在同一沙箱。
How would Code Interpreter use this? Each user message opens a fresh VM; tool-call results come back as values injected into the env.Code Interpreter 如何用？每条用户消息开新 VM；工具调用结果作为值注入 env。