Confidence-Gated RAG Systems

Confidence at every stage of the retrieval-augmented generation pipeline (§6)

Overview

Retrieval-augmented generation (RAG) is a four-stage pipeline: (1) decide whether to retrieve, (2) retrieve passages, (3) filter/rank retrieved context, and (4) generate with context, detecting hallucinations and deciding when to abstain.

Confidence controls all four stages. Low confidence in parametric knowledge triggers retrieval (FLARE, SELF-RAG). Low retrieval quality confidence gates context inclusion (CRAG, FILCO). Low grounding confidence in generated claims triggers refinement or abstention (HALT-RAG, Conformal-RAG). This section surveys confidence-gated approaches across the RAG pipeline.

When to Retrieve

Instead of always retrieving, confidence in parametric knowledge can gate retrieval, saving computation when the model is confident or the query is simple.

Jiang et al. · EMNLP 2023
Monitors token confidence during generation; low-confidence tokens trigger online retrieval of relevant passages.
Self Unit: sentence/token Role: trigger Access: Mid
Su et al. · 2024
Detects uncertain information needs; uses entropy to trigger retrieval at strategic points in generation.
Self Unit: context/step Role: trigger, query Access: Mid
Jeong et al. · 2024
Routes queries by inferred complexity; simple questions skip retrieval, complex ones trigger multi-hop retrieval.
External Unit: query Role: route strategy Access: Pre, FT
Wang et al. · 2023
Elicits the model's confidence in its own knowledge; retrieves only when confidence is low.
Self Unit: query Role: retrieve-or-skip Access: Pre
Asai et al. · ICLR 2024 (Oral)
Generates special reflection tokens that score query relevance, passage utility, and answer quality, controlling retrieval and generation.
Self Unit: query/passage/answer Role: trigger, critique Access: Pre, Ctx, FT
Yao et al. · ACL 2025
Routes queries based on internal model state uncertainty; triggers retrieval, reranking, and routing decisions.
Mechanistic Unit: query/snippet Role: trigger, rerank, route Access: Pre, Ctx, WB
Zubkova et al. · ICASSP 2025
Uses semantic uncertainty to predict query difficulty; triggers deeper retrieval and iterative refinement.
Self Unit: query Role: trigger, depth Access: Pre, MS
Chen et al. · 2025
Monitors parametric vs. retrieval agreement; low agreement triggers filtering and re-retrieval.
Self Unit: query/doc Role: trigger, filter Access: Pre, Ctx

What Context to Keep

After retrieval, confidence in passage relevance and utility determines which passages to include in the context window.

Yan et al. · 2024
Scores retrieved passages by quality; uses confidence to correct or fall back to parametric answers.
External Unit: retrieval set Role: correct, fallback Access: Ctx, Aux
Wang et al. · 2023
Filters and ranks passages by usefulness to the task using auxiliary scoring models.
External Unit: passage/sentence Role: filter Access: Ctx, FT
Wang et al. · EMNLP 2025
Selects documents that maximize information gain about the query, filtering low-value passages.
Self Unit: document Role: rerank, filter Access: Ctx
Isoda · 2025
Scores sentences by the model's self-knowledge and context utility, filtering irrelevant passages.
Self Unit: sentence Role: filter Access: Ctx, FT
Zhu et al. · 2024
Uses confidence-based sparsification to select only the most relevant documents, reducing context size.
Self Unit: document Role: select, sparsify Access: Ctx, Mid
Li et al. · 2024
Scores passages by span-level uncertainty and signal-to-noise ratio; retrieves and ranks by confidence.
Self Unit: span/chunk Role: score, retrieve Access: Ctx, FT

Groundedness Detection

Even with context, models can hallucinate. These methods detect when generated claims are not grounded in retrieved passages.

Sun et al. · 2024
Uses mechanistic probes (entailment check score + parametric knowledge signal) to detect ungrounded claims.
Mechanistic Unit: answer/mechanism Role: detect, mitigate Access: Post, WB
Goswami & Kurra · 2025
Uses an ensemble of NLI models with calibrated confidence to verify claim grounding; abstains if unsure.
External Unit: claim/answer Role: detect, abstain Access: Post, Aux
Fadeeva et al. · 2025
Conditions uncertainty quantification on faithfulness signals; detects claims likely to be hallucinated.
External Unit: claim Role: detect Access: Post, Aux

Abstention & Conformal Prediction

Rather than always generating, these methods use confidence to decide when to abstain, with formal coverage guarantees.

Li et al. · NAACL 2024
Predicts conformal sets of passages and answers with guaranteed coverage, based on calibrated confidence.
Hybrid Unit: passage/answer set Role: set-predict Access: Ctx, Post, CP
Rouzrokh et al. · 2024
Calibrates retrieval thresholds using conformal prediction to ensure retrieval quality coverage.
External Unit: chunk/retrieval set Role: calibrate Access: Ctx, CP
Chakraborty et al. · 2025
Uses conformal prediction to filter snippets with guaranteed relevance coverage.
External Unit: snippet Role: filter Access: Ctx, CP
Feng et al. · SIGIR 2025
Applies conditional conformal prediction to guarantee factuality coverage for sub-claims.
Hybrid Unit: sub-claim Role: filter Access: Post, CP
Sun et al. · ACL 2025
Classifies queries by knowledge boundary; abstains for ambiguous or uncertain cases, aligning with retrieval.
Hybrid Unit: query Role: abstain, align Access: Post, FT

Summary Table

Method Source Signal Unit Role Access
When to Retrieve
FLARE Self Token confidence Sentence/token Trigger Mid
DRAGIN Self Info need entropy Context/step Trigger, query Mid
Adaptive-RAG External Complexity routing Query Route strategy Pre, FT
SKR Self Self-knowledge Query Retrieve-or-skip Pre
SELF-RAG Self Reflection tokens Query/passage/answer Trigger, critique Pre, Ctx, FT
SEAKR Mechanistic Internal uncertainty Query/snippet Trigger, rerank, route Pre, Ctx, WB
SUGAR Self Semantic uncertainty Query Trigger, depth Pre, MS
PAIRS Self Parametric agreement Query/doc Trigger, filter Pre, Ctx
What Context to Keep
CRAG External Retrieval quality Retrieval set Correct, fallback Ctx, Aux
FILCO External Usefulness score Passage/sentence Filter Ctx, FT
InfoGain-RAG Self Info gain Document Rerank, filter Ctx
SKILL-RAG Self Self-knowledge score Sentence Filter Ctx, FT
Sparse-RAG Self Relevance confidence Document Select, sparsify Ctx, Mid
UncertaintyRAG Self Span uncertainty Span/chunk Score, retrieve Ctx, FT
Groundedness Detection & Abstention
ReDeEP Mechanistic ECS + PKS Answer/mechanism Detect, mitigate Post, WB
HALT-RAG External NLI ensemble Claim/answer Detect, abstain Post, Aux
FRANQ External Faithfulness UQ Claim Detect Post, Aux
TRAQ Hybrid Conformal confidence Passage/answer set Set-predict Ctx, Post, CP
ConFLARE External Similarity threshold Chunk/set Calibrate Ctx, CP
Principled Ctx Eng External Snippet relevance Snippet Filter Ctx, CP
Conformal-RAG Hybrid Conformal factuality Sub-claim Filter Post, CP
Divide-Then-Align Hybrid Knowledge boundary Query Abstain, align Post, FT

Discussion

RAG confidence systems are highly source-sensitive. Self-confidence works well for detecting parametric knowledge gaps (FLARE, SKR) but is less reliable for judging retrieval quality. Auxiliary signals (CRAG, HALT-RAG) excel at post-hoc verification but add latency. Mechanistic probes (ReDeEP, SEAKR) offer interpretability but require white-box access.

The best systems combine signals:

  • Trigger retrieval using self-confidence (token uncertainty) or external routers (Adaptive-RAG).
  • Filter retrieved passages using external auxiliary signals (CRAG, FILCO) or mechanistic measures (SEAKR).
  • Verify grounding using external verifiers (HALT-RAG) or conformal methods (Conformal-RAG) with formal coverage guarantees.
  • Abstain strategically when retrieval fails or claims are ungrounded, preserving user trust.