HOB · Heuristic Override Benchmark

Diagnostic case

The car wash test

A single sentence exposes the failure cleanly: no specialised knowledge, no multi-step arithmetic — just a conflict between a surface heuristic and an implicit constraint.

car-wash-test · zero-shot

U

User I want to wash my car. The car wash is 50 meters away. Should I walk or drive?

AI

Frontier LLM · typical response Since the car wash is only 50 meters away, you should just walk. It's such a short distance that driving would waste fuel and be slower once you factor in parking. Enjoy the fresh air!
✗ Missed constraint

✓

Correct answer Drive. You cannot wash a car that is not at the car wash — the car must be co-located with the wash. Distance is irrelevant when the presence constraint is violated.
✓ Constraint satisfied

— @knowmadd, Mastodon, Feb 2026. In a follow-up 53-model sweep, 42 recommended walking on a single pass.

Headline findings

Six numbers

Under a strict 10/10 consistency criterion across 500 instances, no model reliably overrides salient heuristics when they conflict with hidden constraints.

74.6%

Ceiling accuracy

Best model (Gemini 3.1 Pro) tops out at 74.6% strict override accuracy — no frontier system exceeds 75%.

9–38×

Heuristic dominance

In the car-wash case study, distance exerts 9–38× more causal influence on the decision than the goal.

+15 pp

One-word hint

A single italicised hint recovers +15.3 pp on average — the knowledge is present; the bottleneck is inference.

12 / 14

Conservative bias

12 of 14 models do worse when the constraint is removed (drops up to −38.5 pp).

44%

Presence constraints

C-pres (object must be co-located with goal) is the hardest family — mean 44.4% across all 14 models.

+6–9 pp

Goal-decomposition

Prompting models to enumerate preconditions first recovers +6–9 pp on weaker models — a zero-cost fix.

Mechanism

Why the test is diagnostic

The input decomposes into three spans that pull the model in opposite directions. Across six open models, the distance span dominates the decision by 9–38×.

The structure

goal “get my car washed” — implies the car must be co-located with the wash.
heuristic “50 m away” — short distance ⇒ walking is the default answer.
options walk vs. drive — a forced binary.

The correct answer is drive: you cannot wash a car that is not at the car wash. Yet every paraphrase, across every model we tested in Study 1, produces the wrong answer — 0% accuracy.

Cross-model monotonicity overlay — All six models' conflict curves (solid) are sigmoids that track the control (dashed grey) — a goal-independent mapping from distance to decision.

Framework

Diagnose → Measure → Bridge → Treat

A four-stage arc that goes from a single viral example to a benchmark and a mitigation.

01 · Diagnose

Mechanistic analysis

Causal occlusion + monotonicity curves on six open models. Distance dominates by 9–38×; goal spans barely move the decision.

02 · Measure

The HOB benchmark

500 instances across 4 heuristic × 5 constraint families, with minimal pairs and explicitness gradients, evaluated on 14 frontier models.

03 · Bridge

Parametric probes

Four probes extend the sigmoid analysis to cost, efficiency, and semantic-similarity heuristics across three constraint families.

04 · Treat

Goal-decomposition

A one-line prefix prompting the model to list preconditions before answering recovers +9 pp on Llama 4 Scout — no tuning required.

Read the full method →

Citation

BibTeX

@article{li2026model,
  title={The Model Says Walk: How Surface Heuristics Override Implicit Constraints in LLM Reasoning},
  author={Li, Yubo and Zhang, Lu and Jiang, Tianchong and Krishnan, Ramayya and Padman, Rema},
  journal={arXiv preprint arXiv:2603.29025},
  year={2026}
}