Confidence-Gated RAG | Awesome LLM Confidence

Overview

Retrieval-augmented generation (RAG) is a four-stage pipeline: (1) decide whether to retrieve, (2) retrieve passages, (3) filter/rank retrieved context, and (4) generate with context, detecting hallucinations and deciding when to abstain.

Confidence controls all four stages. Low confidence in parametric knowledge triggers retrieval (FLARE, SELF-RAG). Low retrieval quality confidence gates context inclusion (CRAG, FILCO). Low grounding confidence in generated claims triggers refinement or abstention (HALT-RAG, Conformal-RAG). This section surveys confidence-gated approaches across the RAG pipeline.

When to Retrieve

Instead of always retrieving, confidence in parametric knowledge can gate retrieval, saving computation when the model is confident or the query is simple.

Active Retrieval Augmented Generation

Jiang et al. · EMNLP 2023

Monitors token confidence during generation; low-confidence tokens trigger online retrieval of relevant passages.

Self Unit: sentence/token Role: trigger Access: Mid

DRAGIN: Dynamic Retrieval Augmented Generation Based on the Information Needs of Large Language Models

Su et al. · 2024

Detects uncertain information needs; uses entropy to trigger retrieval at strategic points in generation.

Self Unit: context/step Role: trigger, query Access: Mid

Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models Through Question Complexity

Jeong et al. · 2024

Routes queries by inferred complexity; simple questions skip retrieval, complex ones trigger multi-hop retrieval.

External Unit: query Role: route strategy Access: Pre, FT

Self-Knowledge Guided Retrieval Augmentation for Large Language Models

Wang et al. · 2023

Elicits the model's confidence in its own knowledge; retrieves only when confidence is low.

Self Unit: query Role: retrieve-or-skip Access: Pre

SELF-RAG: Learning to Retrieve, Generate, and Critique Through Self-Reflection

Asai et al. · ICLR 2024 (Oral)

Generates special reflection tokens that score query relevance, passage utility, and answer quality, controlling retrieval and generation.

Self Unit: query/passage/answer Role: trigger, critique Access: Pre, Ctx, FT

SEAKR: Self-Aware Knowledge Retrieval for Adaptive Retrieval Augmented Generation

Yao et al. · ACL 2025

Routes queries based on internal model state uncertainty; triggers retrieval, reranking, and routing decisions.

Mechanistic Unit: query/snippet Role: trigger, rerank, route Access: Pre, Ctx, WB

SUGAR: Leveraging Contextual Confidence for Smarter Retrieval

Zubkova et al. · ICASSP 2025

Uses semantic uncertainty to predict query difficulty; triggers deeper retrieval and iterative refinement.

Self Unit: query Role: trigger, depth Access: Pre, MS

PAIRS: Parametric-Verified Adaptive Information Retrieval and Selection for Efficient RAG

Chen et al. · 2025

Monitors parametric vs. retrieval agreement; low agreement triggers filtering and re-retrieval.

Self Unit: query/doc Role: trigger, filter Access: Pre, Ctx

What Context to Keep

After retrieval, confidence in passage relevance and utility determines which passages to include in the context window.

Corrective Retrieval Augmented Generation

Yan et al. · 2024

Scores retrieved passages by quality; uses confidence to correct or fall back to parametric answers.

External Unit: retrieval set Role: correct, fallback Access: Ctx, Aux

Learning to Filter Context for Retrieval-Augmented Generation

Wang et al. · 2023

Filters and ranks passages by usefulness to the task using auxiliary scoring models.

External Unit: passage/sentence Role: filter Access: Ctx, FT

InfoGain-RAG: Boosting Retrieval-Augmented Generation Through Document Information Gain-Based Reranking and Filtering

Wang et al. · EMNLP 2025

Selects documents that maximize information gain about the query, filtering low-value passages.

Self Unit: document Role: rerank, filter Access: Ctx

SKILL-RAG: Self-Knowledge Induced Learning and Filtering for Retrieval-Augmented Generation

Isoda · 2025

Scores sentences by the model's self-knowledge and context utility, filtering irrelevant passages.

Self Unit: sentence Role: filter Access: Ctx, FT

Sparse-RAG: Sparse Document Selection

Zhu et al. · 2024

Uses confidence-based sparsification to select only the most relevant documents, reducing context size.

Self Unit: document Role: select, sparsify Access: Ctx, Mid

UncertaintyRAG: Span-Level Uncertainty Enhanced Long-Context Modeling for Retrieval-Augmented Generation

Li et al. · 2024

Scores passages by span-level uncertainty and signal-to-noise ratio; retrieves and ranks by confidence.

Self Unit: span/chunk Role: score, retrieve Access: Ctx, FT

Groundedness Detection

Even with context, models can hallucinate. These methods detect when generated claims are not grounded in retrieved passages.

ReDeEP: Detecting Hallucination in Retrieval-Augmented Generation via Mechanistic Interpretability

Sun et al. · 2024

Uses mechanistic probes (entailment check score + parametric knowledge signal) to detect ungrounded claims.

Mechanistic Unit: answer/mechanism Role: detect, mitigate Access: Post, WB

HALT-RAG: A Task-Adaptable Framework for Hallucination Detection with Calibrated NLI Ensembles and Abstention

Goswami & Kurra · 2025

Uses an ensemble of NLI models with calibrated confidence to verify claim grounding; abstains if unsure.

External Unit: claim/answer Role: detect, abstain Access: Post, Aux

Faithfulness-Aware Uncertainty Quantification for Fact-Checking the Output of Retrieval Augmented Generation

Fadeeva et al. · 2025

Conditions uncertainty quantification on faithfulness signals; detects claims likely to be hallucinated.

External Unit: claim Role: detect Access: Post, Aux

Abstention & Conformal Prediction

Rather than always generating, these methods use confidence to decide when to abstain, with formal coverage guarantees.

TRAQ: Trustworthy Retrieval Augmented Question Answering via Conformal Prediction

Li et al. · NAACL 2024

Predicts conformal sets of passages and answers with guaranteed coverage, based on calibrated confidence.

Hybrid Unit: passage/answer set Role: set-predict Access: Ctx, Post, CP

ConFLARE: Conformal Large Language Model Retrieval

Rouzrokh et al. · 2024

Calibrates retrieval thresholds using conformal prediction to ensure retrieval quality coverage.

External Unit: chunk/retrieval set Role: calibrate Access: Ctx, CP

Principled Context Engineering for RAG: Statistical Guarantees via Conformal Prediction

Chakraborty et al. · 2025

Uses conformal prediction to filter snippets with guaranteed relevance coverage.

External Unit: snippet Role: filter Access: Ctx, CP

Response Quality Assessment for Retrieval-Augmented Generation via Conditional Conformal Factuality

Feng et al. · SIGIR 2025

Applies conditional conformal prediction to guarantee factuality coverage for sub-claims.

Hybrid Unit: sub-claim Role: filter Access: Post, CP

Divide-Then-Align: Honest Alignment Based on the Knowledge Boundary of RAG

Sun et al. · ACL 2025

Classifies queries by knowledge boundary; abstains for ambiguous or uncertain cases, aligning with retrieval.

Hybrid Unit: query Role: abstain, align Access: Post, FT

Summary Table

Method	Source	Signal	Unit	Role	Access
When to Retrieve
FLARE	Self	Token confidence	Sentence/token	Trigger	Mid
DRAGIN	Self	Info need entropy	Context/step	Trigger, query	Mid
Adaptive-RAG	External	Complexity routing	Query	Route strategy	Pre, FT
SKR	Self	Self-knowledge	Query	Retrieve-or-skip	Pre
SELF-RAG	Self	Reflection tokens	Query/passage/answer	Trigger, critique	Pre, Ctx, FT
SEAKR	Mechanistic	Internal uncertainty	Query/snippet	Trigger, rerank, route	Pre, Ctx, WB
SUGAR	Self	Semantic uncertainty	Query	Trigger, depth	Pre, MS
PAIRS	Self	Parametric agreement	Query/doc	Trigger, filter	Pre, Ctx
What Context to Keep
CRAG	External	Retrieval quality	Retrieval set	Correct, fallback	Ctx, Aux
FILCO	External	Usefulness score	Passage/sentence	Filter	Ctx, FT
InfoGain-RAG	Self	Info gain	Document	Rerank, filter	Ctx
SKILL-RAG	Self	Self-knowledge score	Sentence	Filter	Ctx, FT
Sparse-RAG	Self	Relevance confidence	Document	Select, sparsify	Ctx, Mid
UncertaintyRAG	Self	Span uncertainty	Span/chunk	Score, retrieve	Ctx, FT
Groundedness Detection & Abstention
ReDeEP	Mechanistic	ECS + PKS	Answer/mechanism	Detect, mitigate	Post, WB
HALT-RAG	External	NLI ensemble	Claim/answer	Detect, abstain	Post, Aux
FRANQ	External	Faithfulness UQ	Claim	Detect	Post, Aux
TRAQ	Hybrid	Conformal confidence	Passage/answer set	Set-predict	Ctx, Post, CP
ConFLARE	External	Similarity threshold	Chunk/set	Calibrate	Ctx, CP
Principled Ctx Eng	External	Snippet relevance	Snippet	Filter	Ctx, CP
Conformal-RAG	Hybrid	Conformal factuality	Sub-claim	Filter	Post, CP
Divide-Then-Align	Hybrid	Knowledge boundary	Query	Abstain, align	Post, FT

Discussion

RAG confidence systems are highly source-sensitive. Self-confidence works well for detecting parametric knowledge gaps (FLARE, SKR) but is less reliable for judging retrieval quality. Auxiliary signals (CRAG, HALT-RAG) excel at post-hoc verification but add latency. Mechanistic probes (ReDeEP, SEAKR) offer interpretability but require white-box access.

The best systems combine signals:

Trigger retrieval using self-confidence (token uncertainty) or external routers (Adaptive-RAG).
Filter retrieved passages using external auxiliary signals (CRAG, FILCO) or mechanistic measures (SEAKR).
Verify grounding using external verifiers (HALT-RAG) or conformal methods (Conformal-RAG) with formal coverage guarantees.
Abstain strategically when retrieval fails or claims are ungrounded, preserving user trust.

Confidence-Gated RAG Systems