Reinforcement Learning of Vision Language Models with Self Visual Perception Reward
reinforcement-learning self-improvement self-rewarding vision-language-models qwen-vl grpo self-evolving-ai visual-perception-reward
-
Updated
Sep 12, 2025 - Python