Welcome to the official repository for our paper: Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling (NeurIPS 2025). Explore our project page here for an interactive overview!
Authors: Tsung-Han Wu, Heekyung Lee, Jiaxin Ge, Joseph E. Gonzalez, Trevor Darrell, David M. Chan (UC Berkeley & POSTECH)
- π€ tsunghanwu/reverse_llava_v15
- π€ tsunghanwu/reverse_llava_more
- π€ tsunghanwu/reverse_qwen25_vl
- [04/17/2025]: REVERSE is now live on HuggingFace and GitHub! Explore checkpoints, dataset, and full paper from our project site.
- [05/29/2025]: REVERSE supports Qwen2.5-VL and is also effective. Check it out!
- [09/18/2025]: REVERSE is accepted to NeurIPS 2025!!!
- Clone this repository
git clone https://github.com/tsunghan-wu/reverse_vlm
cd reverse_vlm
- Set up the environment
-
For LLaVA:
conda create -n reverse python=3.10 -y conda activate reverse pip install --upgrade pip # enable PEP 660 support pip install -e . pip install -e ".[train]" pip install flash-attn --no-build-isolation --no-cache-dir
-
For Qwen series, please follow the installation guideline in Qwen2-VL-Finetune.
-
Download model checkpoints:
- π€ reverse_llava_v15 (LLaVA-v1.5-7B style model)
- π€ reverse_llava_more (LLaVA with LLama3.1-8B-Instruct style model)
- π€ tsunghanwu/reverse_qwen25_vl (Qwen2.5-VL-3B-Instruct style model)
-
Download required evaluation files from Google Drive
β Unzip and place them intoplayground/data/eval
. Then, follow the included instructions to download additional assets. -
Run evaluations with:
bash scripts/eval/*.sh
We conduct 100-round bootstrapped evaluation. Reported numbers should closely match those in the paper.
-
Download QA pairs from: π€ tsunghanwu/reverse-instruct-1.3m
-
Organize datasets under
playground/data/
using the following structure (following LLaVAβs layout):
Show Data Structure
playground/data/ βββ coco β βββ annotations β βββ test2017 β βββ train2017 β βββ val2017 βββ gqa β βββ images βββ ocr_vqa β βββ images βββ share_textvqa β βββ images βββ textvqa β βββ train_images βββ vg βββ VG_100K βββ VG_100K_2
- Add special tokens to the base LLM:
python3 scripts/add_new_token_to_llava.py
python3 scripts/add_new_token_to_qwen.py
Supported settings:
-
LoRA finetuning for LLaVA-series
- lmsys/vicuna-7b-v1.5, with mm_projector weights from LLaVA-v1.5-7B's projector
- meta-llama/Llama-3.1-8B-Instruct, with mm_projector weights from LLaVA-MORE-8B's projector
-
Direct finetuning for the Qwen2.5-VL model
- Qwen/Qwen2.5-VL-3B-Instruct
- To ensure the apple-to-apple comparison, we fine-tune the released Qwen2.5-VL-3B model using both the LLaVA-FT setup and our REVERSE recipe, applying both on the same 100k subset. This allows us to directly compare the impact of our training/inference recipe against the basic training/inference baseline under consistent conditions as the Qwen2.5-VL's instruction tuning data is not publicly available.
-
Launch Training:
bash scripts/train/*.sh
After training, merge the LoRA adapter weights into the base model:
CUDA_VISIBLE_DEVICES=5 python3 scripts/merge_lora_weights.py --model-path <your lora path> --model-base <the base llm path with new tokens> --save-model-path <final model path>
β οΈ Notes:
- Set
GPU_SETTINGS
andMASTER_PORT
appropriately when using DeepSpeed.- Naming matters:
- Your LoRA directory should contain
llava_lora
- The final merged model path should contain
llava
This is required due to how LLaVA loads models internally β otherwise, it may fail silently or load incorrectly.
We are grateful for the foundational code provided by LLaVA, LLaVA-More, and Fine-tuning Qwen2-VL Series. Utilizing their resources implies agreement with their respective licenses. Our project benefits greatly from these contributions, and we acknowledge their significant impact on our work.
If you use our work or our implementation in this repo or find them helpful, please consider giving a citation.
@article{wu2025reverse,
title={Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling},
author={Wu, Tsung-Han and Lee, Heekyung and Ge, Jiaxin and Gonzalez, Joseph E and Darrell, Trevor and Chan, David M},
journal={arXiv preprint arXiv:2504.13169},
year={2025}
}