Skip to content

πŸ”₯ Official implementation of "Generate, but Verify: Reducing Visual Hallucination in Vision-Language Models with Retrospective Resampling" (NeurIPS 2025)

Notifications You must be signed in to change notification settings

tsunghan-wu/reverse_vlm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

12 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

REVERSE-VLM: Vision-Language Model with
REtrospective VERification and SElf-correction

HF Collection Β  arXiv Β  NeurIPS 2025


Welcome to the official repository for our paper: Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling (NeurIPS 2025). Explore our project page here for an interactive overview!

Authors: Tsung-Han Wu, Heekyung Lee, Jiaxin Ge, Joseph E. Gonzalez, Trevor Darrell, David M. Chan (UC Berkeley & POSTECH)

πŸ”— Model Checkpoints:

πŸ“¦ Dataset:

πŸ“„ Change Log:

  • [04/17/2025]: REVERSE is now live on HuggingFace and GitHub! Explore checkpoints, dataset, and full paper from our project site.
  • [05/29/2025]: REVERSE supports Qwen2.5-VL and is also effective. Check it out!
  • [09/18/2025]: REVERSE is accepted to NeurIPS 2025!!!

πŸ”§ Installation Guide

  1. Clone this repository
git clone https://github.com/tsunghan-wu/reverse_vlm
cd reverse_vlm
  1. Set up the environment
  • For LLaVA:

    conda create -n reverse python=3.10 -y
    conda activate reverse
    pip install --upgrade pip  # enable PEP 660 support
    pip install -e .
    pip install -e ".[train]"
    pip install flash-attn --no-build-isolation --no-cache-dir
  • For Qwen series, please follow the installation guideline in Qwen2-VL-Finetune.

πŸ“ˆ Evaluation

  • Download model checkpoints:

  • Download required evaluation files from Google Drive
    β†’ Unzip and place them into playground/data/eval. Then, follow the included instructions to download additional assets.

  • Run evaluations with:

bash scripts/eval/*.sh

We conduct 100-round bootstrapped evaluation. Reported numbers should closely match those in the paper.

πŸš€ Training

1. Data Preparation

Show Data Structure
playground/data/
β”œβ”€β”€ coco
β”‚   β”œβ”€β”€ annotations
β”‚   β”œβ”€β”€ test2017
β”‚   β”œβ”€β”€ train2017
β”‚   └── val2017
β”œβ”€β”€ gqa
β”‚   └── images
β”œβ”€β”€ ocr_vqa
β”‚   └── images
β”œβ”€β”€ share_textvqa
β”‚   └── images
β”œβ”€β”€ textvqa
β”‚   └── train_images
└── vg
    β”œβ”€β”€ VG_100K
    └── VG_100K_2
    

2. Model Setup and Training

  • Add special tokens to the base LLM:
python3 scripts/add_new_token_to_llava.py
python3 scripts/add_new_token_to_qwen.py

Supported settings:

  • LoRA finetuning for LLaVA-series

  • Direct finetuning for the Qwen2.5-VL model

    • Qwen/Qwen2.5-VL-3B-Instruct
    • To ensure the apple-to-apple comparison, we fine-tune the released Qwen2.5-VL-3B model using both the LLaVA-FT setup and our REVERSE recipe, applying both on the same 100k subset. This allows us to directly compare the impact of our training/inference recipe against the basic training/inference baseline under consistent conditions as the Qwen2.5-VL's instruction tuning data is not publicly available.
  • Launch Training: bash scripts/train/*.sh

3. Merge LoRA Weights (for LLaVA series only)

After training, merge the LoRA adapter weights into the base model:

CUDA_VISIBLE_DEVICES=5 python3 scripts/merge_lora_weights.py --model-path <your lora path> --model-base <the base llm path with new tokens> --save-model-path <final model path>

⚠️ Notes:

  • Set GPU_SETTINGS and MASTER_PORT appropriately when using DeepSpeed.
  • Naming matters:
    • Your LoRA directory should contain llava_lora
    • The final merged model path should contain llava
      This is required due to how LLaVA loads models internally β€” otherwise, it may fail silently or load incorrectly.

πŸ™ Acknowledgements

We are grateful for the foundational code provided by LLaVA, LLaVA-More, and Fine-tuning Qwen2-VL Series. Utilizing their resources implies agreement with their respective licenses. Our project benefits greatly from these contributions, and we acknowledge their significant impact on our work.

πŸ“š Citation

If you use our work or our implementation in this repo or find them helpful, please consider giving a citation.

@article{wu2025reverse,
  title={Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling},
  author={Wu, Tsung-Han and Lee, Heekyung and Ge, Jiaxin and Gonzalez, Joseph E and Darrell, Trevor and Chan, David M},
  journal={arXiv preprint arXiv:2504.13169},
  year={2025}
}

About

πŸ”₯ Official implementation of "Generate, but Verify: Reducing Visual Hallucination in Vision-Language Models with Retrospective Resampling" (NeurIPS 2025)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published