Confidence-Driven Inference

Online confidence control at candidate, state, and token levels (§4)

Overview

At inference time, confidence shapes three critical decisions: (1) which candidate response to return (output selection), (2) when to stop generating or searching (adaptive stopping), and (3) how to shape the token distribution during decoding (decoding control).

Unlike training, which amortizes decisions across many examples, inference decisions are made online and immediately affect user experience. Confidence signals must be fast, interpretable, and reliable.

Output Selection

When multiple candidate responses are available (from beam search, sampling, or multi-turn generation), confidence scores decide which one to present to the user.

Wang et al. · ICLR 2023
Samples multiple reasoning paths and selects the most frequently occurring answer, using agreement as confidence.
Self Unit: candidate Role: vote Access: MS
Taubenfeld et al. · ACL Findings 2025
Weights votes by the model's implicit confidence P(true) in each reasoning path, improving selection accuracy.
Self Unit: path Role: weighted vote Access: MS
Chen et al. · 2023
Trains model to select its own most consistent response without external voting, embedding confidence selection.
Self Unit: response Role: select Access: MS
Jeong & Choi · 2025
Combines self-consistency frequency with ambiguity detection to re-score answers, improving selection robustness.
Self Unit: answer Role: trigger, rescore Access: MS
Lightman et al. · ICLR 2024
Auxiliary reward model that scores each step in a reasoning path, then reranks complete solutions by accumulated rewards.
Auxiliary Unit: step/solution Role: rerank Access: AV, MS
Wang et al. · ACL 2024
Automatically generates step-level labels for solutions, enabling training of process rewards for selection.
Auxiliary Unit: step/solution Role: rerank Access: AV, MS
Zhou et al. · NeurIPS 2025
Elicits and calibrates verbal confidence from models, using it to select best responses in diverse settings.
Self Unit: answer Role: calibrate, select Access: BB

Adaptive Stopping & Search

Instead of generating a fixed number of samples or search steps, confidence can decide when enough exploration has been done and it is safe to commit to a solution.

Aggarwal et al. · EMNLP 2023
Stops generating additional reasoning paths once the majority answer achieves stable agreement.
Self Unit: answer set Role: stop Access: MS
Huang et al. · 2025
Uses calibrated confidence estimates to determine when to stop voting and commit to the selected answer.
Self Unit: candidate Role: vote, stop Access: MS, FT
Fu et al. · 2025
Monitors the confidence of low-probability tokens in generation; stops or filters when confidence drops below threshold.
Self Unit: token group Role: filter, stop Access: WB, MS
Li et al. · ACL Findings 2025
Compares answer-level vs. token-level probabilities to decide whether to maintain the answer or revise it.
Self Unit: response/turn Role: maintain, revise Access: WB
Yao et al. · NeurIPS 2023
Scores intermediate reasoning states with value or voting confidence; expands promising branches and prunes low-confidence ones.
Self Unit: thought state Role: expand, prune Access: MS
Qiao et al. · EMNLP 2025
Uses confidence in self-reflection steps to decide whether to stop iterating or compress the trajectory.
Self Unit: step/trajectory Role: stop, compress Access: FT

Decoding Control

Rather than selecting finished outputs, these methods reshape the token distribution during generation based on confidence signals.

Li et al. · ACL 2023
Reweights tokens by the gap between expert and amateur model logits, boosting confident expert predictions.
Hybrid Unit: token Role: reweight Access: 2M
Chuang et al. · ICLR 2024
Contrasts predictions from early and late layers to identify confident, factual tokens.
Self Unit: token Role: reweight Access: WB
Huang & Chen · 2025
Contrasts model behavior with/without context masking to detect factual (context-dependent) tokens.
Self Unit: token Role: reweight Access: WB
Khandelwal et al. · EMNLP 2025
Identifies tokens where prior and context disagree, blending confidences to improve factuality.
Hybrid Unit: token Role: blend, reweight Access: WB
Zhang et al. · EMNLP 2025
Trains a gate to selectively apply contrastive decoding when model confidence is low.
Self Unit: token Role: gate, reweight Access: WB, FT
Wei et al. · 2024
Scores sub-structures (phrases, clauses) by confidence and reranks beam hypotheses accordingly.
Self Unit: sub-structure Role: beam rerank Access: WB, FT

Summary Table

Method Source Signal Unit Role Access
Output Selection
Self-Consistency Self Answer agreement Candidate Vote MS
CISC Self Path P(true) Path Weighted vote MS
Universal SC Self Self-selection Response Select MS
ACR Self SC + ambiguity Answer Trigger, rescore MS
PRM Auxiliary Step rewards Step/solution Rerank AV, MS
Math-Shepherd Auxiliary Process labels Step/solution Rerank AV, MS
SteerConf Self Verbal confidence Answer Calibrate, select BB
Adaptive Stopping & Search
Adaptive-Consistency Self Answer stability Answer set Stop MS
Efficient TTS Self Response confidence Candidate Vote, stop MS, FT
DeepConf Self Low-prob tokens Token group Filter, stop WB, MS
Firm-or-Fickle Self Answer vs. token prob Response/turn Maintain, revise WB
ToT Self State value/vote Thought state Expand, prune MS
ConCISE Self Reflection confidence Step/trajectory Stop, compress FT
Decoding Control
Contrastive Decoding Hybrid Expert-amateur gap Token Reweight 2M
DoLa Self Layer contrast Token Reweight WB
Delta-CD Self Masked-context gap Token Reweight WB
CoCoA Hybrid Prior-context conflict Token Blend, reweight WB
ActLCD Self Learned trigger Token Gate, reweight WB, FT
CABS Self Sub-structure confidence Sub-structure Beam rerank WB, FT

Access legend: MS = multiple samples, WB = white-box (internal states), FT = fine-tuned, AV = auxiliary verifier, BB = black-box

Discussion

Inference confidence operates at three distinct time scales:

  • Candidate → Select/Aggregate. Given multiple complete outputs (from sampling or beam search), confidence decides which to return or how to aggregate. Self-consistency and process rewards both operate here.
  • State → Continue/Stop/Revise. At intermediate points (after each token, thought, or step), confidence decides whether to commit (stop), continue exploring (expand), or reconsider (revise). Tree-of-Thought and adaptive consistency exemplify this layer.
  • Token → Reshape Distribution. Within a single forward pass, confidence reshapes logit distributions to boost high-confidence tokens and suppress hallucinations. Contrastive decoding and DoLa operate here.

These signals are not interchangeable. A high self-consistency score does not imply high logit probability. A low-confidence token can still occur in a high-confidence reasoning path. The best systems leverage signals at all three levels, with careful calibration to avoid redundancy and propagation of errors.