Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling

VLMs suffer from Hallucinations

Vision-Language Models (VLMs) have made huge strides on tasks like image captioning or visual question answering. However, they still suffer from hallucinations, where they generate descriptions on nonexistent objects or concepts.

Previous approaches generally fall into two paradigms:

Generation Adjustment: This method aims to improve the alignment of textual outputs with visual inputs by modifying the VLM’s generation process. This can be done either in a training-free manner (adjusting logits at decoding time) or through a training-based approach (introducing additional supervision signals or custom objective functions).
Post-hoc Verification: This method introduces large external models (e.g.,GPT-4) to evaluate and verify outputs after generation.

However, generation adjustment methods struggle to correct erroneous tokens once generated and do not leverage retrospective reasoning to assess output quality. On the other hand, post-hoc verification is computationally expensive, and often results in generic refusals rather than targeted improvements.

REVERSE: REtrospective VERification and SElf-correction

To mitigate this, we introduce REVERSE (REtrospective VERification and SElf-correction), the first framework to integrate generation adjustment with online post-hoc verification within a single VLM architecture. REVERSE detects, backtracks, and corrects hallucinations during the decoding process.

REVERSE's Training Recipe: A New Training Data with Custom Loss

Introducing Three Special Tokens: REVERSE training introduces three tokens which can be used to explicity mark key phrases to represent the model's confidence level: <SPAN> marks the start of a key phrase. The phrase then ends with either </CN> (for confident/grounded) or </UN> (for unconfident/potentially hallucinated). These tokens act like in-line confidence classifiers, enabling the model not only to flag uncertainty but also to determine where to backtrack.

A New Training Dataset: Annotating our data with such tokens, we built 1.3M-sample instruction-tuning dataset, augmenting LLaVA-v1.5. Our dataset maintains a similar overall composition from the LLaVA-v1.5. dataset, while preserving the data quality, the same average question-answer pairs per sample and a comparable question type distribution.

Hallucination-Aware Training Objectives:
- Standard next-token prediction: Retains conventional instruction tuning behavior for answer generation.
- Avoiding hallucination modeling: Minimizes the likelihood of producing tokens labeled as hallucinated in the dataset.
- Confidence tagging: Teaches the model when to emit <SPAN>, and whether to end with </CN> or </UN> as a signal of groundedness.

We achieve this through a weighted token loss: positive weights are applied to <SPAN> and </CN> tokens, while zero weights are assigned to tokens inside <SPAN>...</UN> sections—effectively masking them to avoid penalizing the model when generating ungrounded phrases.

REVERSE's Inference Paradigm: Retrospective Resampling

During inference, REVERSE performs next-token prediction while monitoring the probability of </UN>. Instead of passively waiting for a hallucination to fully appear, we proactively intervene when the probability exceeds a set confidence threshold (τ). This enables the model to identify and correct hallucinations before they are fully formed.

Backtracking Strategy:
1. First, backtrack to the most recent </CN> token.
2. (After K local correction attempts): The model assumes the issue originates earlier and backtracks to the nearest prior punctuation.
3. (After N total attempts): The model returns the output with a flag indicating that hallucination is unresolved.
Self-Correction Strategies:
- Rejection Sampling: The model resamples multiple completions at a higher temperature (T+ΔT), searching for an alternative phrase that falls below the hallucination threshold.
- Query Rewriting: In addition to rejection sampling, REVERSE augments the prompt with hints to improve grounding. Specifically, the prompt is rewritten to include phrases like: Hint: potential incorrect phrases.
  This instructs the model to revisit uncertain segments to provide a more reliable response. During training, Some of the queries are randomly injected with hint-based rewrites to help the model recognize and respond to them.

Open-ended tasks: In these tasks, models often encounter false premises or lack sufficient context. To address this, we adopt a prompting strategy that encourages the model to identify missing information or invalid assumptions, rather than attempting to answer directly. (Modified prompt): "For this question, please point out the false premises or note what information is missing, rather than answering it directly."

Examples: Real Evaluation Result Examples

Examples from AMBER benchmark evaluation for comparison. Hallucinated objects from other VLMs are highlighted in red.

Results: Performance Comparison Across Different Tasks

Our evaluations show that REVERSE achieves state-of-the-art hallucination reduction, outperforming the best existing methods by up to 12% on CHAIR-MSCOCO and 28% on HaloQuest.

Image Captioning Tasks

Performance comparison across different models on CHAIR-MSCOCO and AMBER(g) benchmark.

Open-ended Question Answering

Performance comparison across different models on mmHal (Left), HaloQuest (Right) benchmark .

Discriminative Questions

Performance comparison across different models on AMBER(d), POPE, MME-Hall.

BibTeX

@article{wu2025reverse,
  title={Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling},
  author={Wu, Tsung-Han and Lee, Heekyung and Ge, Jiaxin and Gonzalez, Joseph E and Darrell, Trevor and Chan, David M},
  journal={arXiv preprint arXiv:2504.13169},
  year={2025}
}

Generate, but Verify: Reducing Visual Hallucination in Vision-Language Models with Retrospective Resampling

VLMs suffer from Hallucinations

REVERSE: REtrospective VERification and SElf-correction

REVERSE: REtrospective VERification and SElf-correction

REVERSE's Training Recipe: A New Training Data with Custom Loss

REVERSE's Inference Paradigm: Retrospective Resampling

Examples: Real Evaluation Result Examples

Results: Performance Comparison Across Different Tasks

BibTeX