REVERSE-VLM: Vision-Language Model with
REtrospective VERification and SElf-correction

Welcome to the official repository for our paper: Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling (NeurIPS 2025). Explore our project page here for an interactive overview!

Authors: Tsung-Han Wu, Heekyung Lee, Jiaxin Ge, Joseph E. Gonzalez, Trevor Darrell, David M. Chan (UC Berkeley & POSTECH)

🔗 Model Checkpoints:

📦 Dataset:

🧾 REVERSE Visual Instruct 1.3M

📄 Change Log:

[04/17/2025]: REVERSE is now live on HuggingFace and GitHub! Explore checkpoints, dataset, and full paper from our project site.
[05/29/2025]: REVERSE supports Qwen2.5-VL and is also effective. Check it out!
[09/18/2025]: REVERSE is accepted to NeurIPS 2025!!!

🔧 Installation Guide

Clone this repository

git clone https://github.com/tsunghan-wu/reverse_vlm
cd reverse_vlm

Set up the environment

For LLaVA:

conda create -n reverse python=3.10 -y
conda activate reverse
pip install --upgrade pip  # enable PEP 660 support
pip install -e .
pip install -e ".[train]"
pip install flash-attn --no-build-isolation --no-cache-dir

For Qwen series, please follow the installation guideline in Qwen2-VL-Finetune.

📈 Evaluation

Download model checkpoints:
- 🤗 reverse_llava_v15 (LLaVA-v1.5-7B style model)
- 🤗 reverse_llava_more (LLaVA with LLama3.1-8B-Instruct style model)
- 🤗 tsunghanwu/reverse_qwen25_vl (Qwen2.5-VL-3B-Instruct style model)
Download required evaluation files from Google Drive
→ Unzip and place them into playground/data/eval. Then, follow the included instructions to download additional assets.
Run evaluations with:

bash scripts/eval/*.sh

We conduct 100-round bootstrapped evaluation. Reported numbers should closely match those in the paper.

🚀 Training

1. Data Preparation

Download QA pairs from: 🤗 tsunghanwu/reverse-instruct-1.3m
Organize datasets under playground/data/ using the following structure (following LLaVA’s layout):

Show Data Structure

playground/data/
├── coco
│   ├── annotations
│   ├── test2017
│   ├── train2017
│   └── val2017
├── gqa
│   └── images
├── ocr_vqa
│   └── images
├── share_textvqa
│   └── images
├── textvqa
│   └── train_images
└── vg
    ├── VG_100K
    └── VG_100K_2

2. Model Setup and Training

Add special tokens to the base LLM:

python3 scripts/add_new_token_to_llava.py
python3 scripts/add_new_token_to_qwen.py

Supported settings:

LoRA finetuning for LLaVA-series
- lmsys/vicuna-7b-v1.5, with mm_projector weights from LLaVA-v1.5-7B's projector
- meta-llama/Llama-3.1-8B-Instruct, with mm_projector weights from LLaVA-MORE-8B's projector
Direct finetuning for the Qwen2.5-VL model
- Qwen/Qwen2.5-VL-3B-Instruct
- To ensure the apple-to-apple comparison, we fine-tune the released Qwen2.5-VL-3B model using both the LLaVA-FT setup and our REVERSE recipe, applying both on the same 100k subset. This allows us to directly compare the impact of our training/inference recipe against the basic training/inference baseline under consistent conditions as the Qwen2.5-VL's instruction tuning data is not publicly available.
Launch Training: bash scripts/train/*.sh

3. Merge LoRA Weights (for LLaVA series only)

After training, merge the LoRA adapter weights into the base model:

CUDA_VISIBLE_DEVICES=5 python3 scripts/merge_lora_weights.py --model-path <your lora path> --model-base <the base llm path with new tokens> --save-model-path <final model path>

⚠️ Notes:

Set GPU_SETTINGS and MASTER_PORT appropriately when using DeepSpeed.

Naming matters:

Your LoRA directory should contain llava_lora

The final merged model path should contain llava
This is required due to how LLaVA loads models internally — otherwise, it may fail silently or load incorrectly.

🙏 Acknowledgements

We are grateful for the foundational code provided by LLaVA, LLaVA-More, and Fine-tuning Qwen2-VL Series. Utilizing their resources implies agreement with their respective licenses. Our project benefits greatly from these contributions, and we acknowledge their significant impact on our work.

📚 Citation

If you use our work or our implementation in this repo or find them helpful, please consider giving a citation.

@article{wu2025reverse,
  title={Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling},
  author={Wu, Tsung-Han and Lee, Heekyung and Ge, Jiaxin and Gonzalez, Joseph E and Darrell, Trevor and Chan, David M},
  journal={arXiv preprint arXiv:2504.13169},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
assets		assets
llava		llava
qwenvl		qwenvl
scripts		scripts
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

REVERSE-VLM: Vision-Language Model with
REtrospective VERification and SElf-correction

🔗 Model Checkpoints:

📦 Dataset:

📄 Change Log:

🔧 Installation Guide

📈 Evaluation

🚀 Training

1. Data Preparation

2. Model Setup and Training

3. Merge LoRA Weights (for LLaVA series only)

🙏 Acknowledgements

📚 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

tsunghan-wu/reverse_vlm

Folders and files

Latest commit

History

Repository files navigation

REVERSE-VLM: Vision-Language Model with REtrospective VERification and SElf-correction

🔗 Model Checkpoints:

📦 Dataset:

📄 Change Log:

🔧 Installation Guide

📈 Evaluation

🚀 Training

1. Data Preparation

2. Model Setup and Training

3. Merge LoRA Weights (for LLaVA series only)

🙏 Acknowledgements

📚 Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

REVERSE-VLM: Vision-Language Model with
REtrospective VERification and SElf-correction

Packages