Transactions & Isolation

ACID — what each letter really means

Jim Gray coined ACID in 1981, and Andreas Reuter added the 'I' for Isolation in 1983. DDIA Ch.7 takes pains to point out that vendors mean different things by each letter; "ACID-compliant" on a brochure is marketing, not a spec.

Atomicity: all-or-nothing on crash. Not about concurrency. Usually implemented via WAL + undo/redo.
Consistency: invariants preserved. The application's job; the DB can only enforce declared constraints (FK, CHECK). Largely aspirational.
Isolation: concurrent transactions do not interfere. The interesting letter. Serializability is the gold standard; nobody ships it by default because it is slow.
Durability: once committed, survives a crash. fsync on the WAL to local SSD gives single-machine durability; replication to another zone gives datacenter-loss durability.

Source cross-reference

DDIA Ch.7 is the canonical reference; Acing SDI Ch.5 adds 2PC/saga case studies with idempotency patterns from e-commerce and payment systems.

The anomaly zoo

What goes wrong without isolation?

Anomaly	Scenario	Prevented by
Dirty read	T2 reads T1's uncommitted writes; T1 aborts.	Read Committed
Dirty write	T2 overwrites T1's uncommitted write.	Read Committed (row lock)
Non-repeatable read	T1 reads x twice, sees different values because T2 committed between.	Repeatable Read / Snapshot
Lost update	Two T1/T2 do read-modify-write on x; one write is lost.	Explicit lock, `CAS`, or SI + write-conflict detection
Write skew	Two transactions each read a set, write based on it, together violate an invariant (e.g., both on-call doctors resign).	Serializable / SSI / materializing conflicts
Phantom	Range query repeated after another commit sees new rows.	Predicate lock / range lock / Serializable

Write skew is the one that bites senior engineers. A "doctor on-call" invariant (at least one doctor must remain on-call) is violated when two resign concurrently — each reads 2 doctors, each decides it's fine to leave. Snapshot isolation does not prevent this.

Isolation levels in practice

SQL standard levels are historical fiction; vendors implement what they can. Actual mapping (2024):

Postgres default: Read Committed. Upgradable to Repeatable Read (= Snapshot Isolation) and Serializable (= SSI).
MySQL InnoDB default: Repeatable Read (with weird phantom behavior — gap locks).
Oracle: Read Committed and Serializable (their "Serializable" is actually Snapshot Isolation).
Spanner: Strict Serializable (serializable + linearizable) by default.
DynamoDB: Read Committed per-item; TransactWriteItems offers serializable across items in a single request.

Anti-pattern

Assuming "Repeatable Read" means repeatable read. MySQL's RR allows phantoms in some edge cases; Oracle's "Serializable" allows write skew. Always reference the actual isolation level as implemented, not the SQL-standard name.

Snapshot isolation vs SSI

Snapshot Isolation (SI): each transaction reads from a consistent snapshot taken at its start. Writes are buffered; at commit, check if any key you wrote was also written by a committed concurrent txn — if so, abort (first-committer-wins). Implemented with MVCC (Postgres, Oracle). Fast, no read locks, prevents dirty/non-repeatable reads but not write skew.

Serializable Snapshot Isolation (SSI): Cahill et al. 2008 (used by Postgres SERIALIZABLE, CockroachDB). Tracks read–write dependencies at runtime; aborts a transaction whose dependency graph forms a dangerous cycle. Gives true serializability at 10–30% throughput cost over SI. This is the default you want when write skew matters (finance, booking).

Performance data: Postgres 15 on pgbench serializable vs read-committed shows about 20% throughput drop at low contention, 50%+ at high contention. Worth it when correctness matters; measure before shipping.

Two-phase commit

2PC coordinates an atomic commit across N participants. Phase 1 (Prepare): coordinator asks all to prepare; each durably writes "ready" or "abort". Phase 2 (Commit): if all ready, coordinator writes "commit" and tells all; otherwise "abort".

sequenceDiagram
    participant C as Coordinator
    participant P1 as Participant 1
    participant P2 as Participant 2
    C->>P1: Prepare
    C->>P2: Prepare
    P1-->>C: ready (durable)
    P2-->>C: ready (durable)
    C->>C: write commit record
    C->>P1: Commit
    C->>P2: Commit
    P1-->>C: ack
    P2-->>C: ack

Fatal flaw: if the coordinator crashes after some participants voted ready but before the commit decision is replicated, those participants are stuck holding locks indefinitely (until a human runs recovery). This is why "XA transactions" have a bad reputation and why Spanner uses Paxos-backed coordinators so the decision itself is fault-tolerant.

Throughput cost: 2PC doubles the write latency (two rounds) and quadruples the WAL traffic. Budget ~20–50 ms per cross-shard commit in-region, hundreds of ms across regions.

Sagas and compensations

When 2PC is too expensive (microservices across teams, different databases, long-running workflows), use a saga: a sequence of local transactions where each has a compensating action for rollback. Originally: Garcia-Molina & Salem 1987 on long-lived transactions.

Example — book a trip:

T1: reserve flight. Compensation: cancel flight.
T2: reserve hotel. Compensation: cancel hotel.
T3: charge card. Compensation: refund.

If T3 fails, run T2-comp then T1-comp. Sagas are not ACID; you lose isolation — another transaction can see half-applied state. You must design idempotent compensations, handle compensation failures (retry queue, human escalation), and add semantic locks (mark "pending" rows) to prevent double booking. Acing SDI Ch.5 gives the pattern with failure-handling state machines.

Idempotency as a primitive

At any layer above a single-node DB, retries are inevitable — client, load balancer, queue. Every mutation must therefore be idempotent: f(f(x)) = f(x).

Idempotency key: client generates a UUID per logical operation, server stores "(key, result)" with TTL. Retries return the stored result, do not re-execute. Stripe popularized this for payments.
Conditional writes: UPDATE ... WHERE version = N, DynamoDB ConditionExpression. Provides optimistic concurrency control.
Upserts: INSERT ... ON CONFLICT DO UPDATE. Safe to retry.

OpenAI-specific

The POST /v1/chat/completions endpoint accepts a client-supplied idempotency-key (or uses request fingerprinting) so that a retried request does not double-charge tokens or duplicate tool calls. Mention this explicitly for any billing/agent question.

Anthropic-specific

For tool-calling agents that make external HTTP requests, sagas with explicit compensation beat 2PC every time. Anthropic interviewers look for: idempotency keys on tool calls, compensation action per tool, max-retry with exponential backoff, and a manual-review queue when compensation fails.

ACID —— 每个字母的真实含义

Jim Gray 1981 年造了 ACID，Andreas Reuter 1983 年加了 I（隔离）。DDIA 第 7 章特别强调各厂商对每个字母的含义都不一样；宣传页上的「ACID 兼容」是营销语不是规范。

Atomicity（原子性）：崩溃时全有或全无。不是并发。WAL + undo/redo 实现。
Consistency（一致性）：保不变式。是应用的职责；DB 只能强制声明的约束（FK、CHECK）。多半是愿景。
Isolation（隔离性）：并发事务互不干扰。最有趣的字母。可串行化是金标，默认没人跑，因为慢。
Durability（持久性）：提交后崩溃不丢。WAL fsync 到本地 SSD 保单机，跨 zone 复制保机房丢失。

来源交叉引用

DDIA 第 7 章是权威；Acing SDI 第 5 章补 2PC/saga 案例与电商/支付的幂等模式。

异常动物园

没有隔离会出什么事？

异常	场景	防御
脏读	T2 读了 T1 未提交；T1 回滚。	Read Committed
脏写	T2 覆盖了 T1 未提交写。	Read Committed（行锁）
不可重复读	T1 两次读 x 不同，中间 T2 提交。	Repeatable Read / 快照
丢失更新	两个读-改-写，一次写被吞。	显式锁 / `CAS` / SI + 写冲突
写偏斜	两个事务各读一组、基于此写，合起来破坏不变式（两个值班医生同时辞职）。	Serializable / SSI / 物化冲突
幻读	同一范围查询再看到新行。	谓词锁 / 范围锁 / Serializable

写偏斜专咬 senior。「至少一位值班医生」不变式下两人同时辞职——各看到还有 2 位、各以为可以走。快照隔离防不住这个。

实务中的隔离级别

SQL 标准隔离级别是历史小说，厂商各自实现。2024 实际映射：

Postgres 默认：Read Committed。可升至 Repeatable Read（= 快照隔离）与 Serializable（= SSI）。
MySQL InnoDB 默认：Repeatable Read（带怪异幻读——gap lock）。
Oracle：Read Committed 与 Serializable（其「Serializable」实为快照隔离）。
Spanner：默认严格可串行化（可串行 + 线性一致）。
DynamoDB：单条目 Read Committed；TransactWriteItems 单请求内多条目可串行。

反模式

假定「Repeatable Read」就是可重复读。MySQL RR 在边界允许幻读；Oracle「Serializable」允许写偏斜。永远引用实际实现的级别，不是标准名。

快照隔离 vs SSI

快照隔离（SI）：每个事务从启动时一致快照读。写先缓存；提交时检查写 key 是否被并发已提交事务写过，有则 abort（first-committer-wins）。MVCC 实现（Postgres、Oracle）。快，无读锁，防脏读/不可重复读，但防不住写偏斜。

可串行化快照隔离（SSI）：Cahill 等 2008（Postgres SERIALIZABLE、CockroachDB 用）。运行时追踪读写依赖，把危险环的事务 abort。真可串行化，比 SI 吞吐低 10–30%。写偏斜要命时（金融、订座）用这个。

性能数据：Postgres 15 pgbench 中 serializable vs read-committed 低竞争下吞吐降 ~20%，高竞争 50%+。正确性重要就值得，上线前测。

两阶段提交

2PC 跨 N 个 participant 协调原子提交。阶段 1（Prepare）：coordinator 问全员，各自持久写「ready」或「abort」。阶段 2（Commit）：全 ready 则 coordinator 写 commit 记录并通知，否则 abort。

sequenceDiagram
    participant C as Coordinator
    participant P1 as Participant 1
    participant P2 as Participant 2
    C->>P1: Prepare
    C->>P2: Prepare
    P1-->>C: ready（持久）
    P2-->>C: ready（持久）
    C->>C: 写 commit 记录
    C->>P1: Commit
    C->>P2: Commit
    P1-->>C: ack
    P2-->>C: ack

致命缺陷：coordinator 在部分 participant 已 ready 但 commit 决定未复制出去时崩溃，这些 participant 将无限期持锁（待人工恢复）。这就是「XA 事务」名声差的原因，也是 Spanner 用 Paxos 支持 coordinator 的原因——决定本身容错。

吞吐代价：2PC 写延迟翻倍（两轮）、WAL 流量四倍。同区域跨分片提交 ~20–50 ms，跨区域几百 ms。

Saga 与补偿

2PC 太贵时（跨团队微服务、不同 DB、长流程）用 saga：一系列本地事务，每个有回滚补偿。原典：Garcia-Molina & Salem 1987 长事务论文。

订行程例：

T1：订机票。补偿：退票。
T2：订酒店。补偿：退订。
T3：扣款。补偿：退款。

T3 失败则跑 T2-comp 再 T1-comp。Saga 不是 ACID——你失去隔离，别的事务能看到半状态。补偿必须幂等、处理补偿失败（重试队列、人工升级）、加语义锁（pending 标记）防重复订。Acing SDI 第 5 章给出带失败态的状态机模式。

幂等作为原语

单机 DB 以上的任何层级都会重试——客户端、LB、队列。每个写操作必须幂等：f(f(x)) = f(x)。

幂等 key：客户端为每个逻辑操作生 UUID，服务端存「(key, result)」带 TTL。重试返存的结果，不再执行。Stripe 把此模式推广到支付。
条件写：UPDATE ... WHERE version = N、DynamoDB ConditionExpression。乐观并发控制。
upsert：INSERT ... ON CONFLICT DO UPDATE。安全重试。

OpenAI 专属

POST /v1/chat/completions 接受客户端 idempotency-key（或用请求指纹），使重试请求不重复扣 token、不重复调用工具。任何计费/agent 题都要提这一点。

Anthropic 专属

调外部 HTTP 的工具型 agent，saga + 显式补偿永远优于 2PC。Anthropic 面试看：工具调用带幂等 key、每工具一个补偿动作、指数退避重试上限、补偿失败入人工审核队列。