AImageLab

All

97 repositories

DitHub
Public
HTML
•
Apache License 2.0
•0•3•0•0•Updated Sep 20, 2025Sep 20, 2025
fed-mammoth
Public
General Federated Continual Learning Framework
Python
•
MIT License
•3•5•1•0•Updated Sep 15, 2025Sep 15, 2025
ReT
Public
[CVPR 2025] Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval
information-retrieval recurrent-neural-networks embeddings multimodal-retrieval rag multimodal-embeddings
Python
•
Apache License 2.0
•1•23•1•0•Updated Sep 12, 2025Sep 12, 2025
Alfie
Public
Democratising RGBA Image Generation With No $$$ (AI4VA@ECCV24)
rgba text-to-image diffusion-model diffusion-models text-to-image-generation diffusion-transformer
Python
•1•30•0•0•Updated Sep 12, 2025Sep 12, 2025
ReT-2
Public
Recurrence Meets Transformers for Universal Multimodal Retrieval
nlp information-retrieval ai computer-vision multimedia retrieval transformers recurrent-neural-networks multimodal rag
Python
•
Apache License 2.0
•0•6•0•0•Updated Sep 12, 2025Sep 12, 2025
coldfront
Public
HPC Resource Allocation System
Python
•
GNU General Public License v3.0
•98•0•0•0•Updated Sep 7, 2025Sep 7, 2025
CHAIR-DPO
Public
[BMVC 2025] Mitigating Hallucinations in Multimodal LLMs via Object-aware Preference Optimization
Python
•0•4•1•0•Updated Sep 2, 2025Sep 2, 2025
MLLMs-FlowTracker
Public
[CAIP 2025] Tracing Information Flow in LLaMA Vision: A Step Toward Multimodal Understanding
Python
•0•2•0•0•Updated Aug 30, 2025Aug 30, 2025
MissRAG
Public
[ICCV 2025] MissRAG: Addressing the Missing Modality Challenge in Multimodal Large Language Models
Python
•0•11•0•0•Updated Aug 30, 2025Aug 30, 2025
mammoth
Public
An Extendible (General) Continual Learning Framework based on Pytorch - official codebase of Dark Experience for General Continual Learning
deep-learning knowledge-distillation neurips2020 dark-experience-replay pytorch der continual-learning experience-replay
Python
•
MIT License
•126•733•1•0•Updated Aug 23, 2025Aug 23, 2025
LLaVA-MORE
Public
[ICCVW 25] LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning
vision-and-language llms llava siglip multimodal-llms llama3 llava-llama3 llama3-vision gemma-2 llama3-1
Python
•
Apache License 2.0
•8•150•0•0•Updated Aug 8, 2025Aug 8, 2025
awesome-captioning-evaluation
Public
[IJCAI 2025] Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives
Python
•0•21•1•0•Updated Aug 4, 2025Aug 4, 2025
ScanDiff
Public
This is the official repository for the paper "Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction". ICCV 2025
0•13•1•0•Updated Aug 4, 2025Aug 4, 2025
Emuru-autoregressive-text-img
Public
Official PyTorch implementation for "Zero-Shot Styled Text Image Generation, but Make It Autoregressive" (CVPR25)
computer-vision transformers generative-model image-generation auto-regressive-model text-to-image autoregressive-models handwritten-text-generation text-to-image-generation generative-ai
Python
•
MIT License
•0•11•1•0•Updated Jul 31, 2025Jul 31, 2025
TransFusion
Public
Official codebase of "Update Your Transformer to the Latest Release: Re-Basin of Task Vectors" - ICML 2025
machine-learning deep-learning pytorch
Python
•0•18•0•0•Updated Jul 30, 2025Jul 30, 2025
pacscore
Public
[CVPR 2023 & IJCV 2025] Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation
computer-vision cvpr captioning-images captioning captioning-videos vision-and-language cvpr2023
Python
•9•64•4•0•Updated Jul 29, 2025Jul 29, 2025
ReflectiVA
Public
[CVPR 2025] Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering
vqa knowledge-base vlm multimodal mllm
Python
•
Apache License 2.0
•0•46•3•0•Updated Jul 14, 2025Jul 14, 2025
biblical-retrieval-synthesis
Public
[TPDL 2025] Generating Synthetic Data with Large Language Models for Low-Resource Sentence Retrieval
retrieval language-model synthetic-data
0•0•0•0•Updated Jul 9, 2025Jul 9, 2025
MAD
Public
Official PyTorch implementation for "Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas", presenting the Merge-Attend-Diffuse operator (ECCV24)
generative-model image-generation text-to-image diffusion-models consistency-models text-to-image-generation stable-diffusion stable-diffusion-xl
Python
•1•14•0•0•Updated Jul 9, 2025Jul 9, 2025
DICE
Public
[ICCV 2025] What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models
0•7•0•0•Updated Jul 8, 2025Jul 8, 2025
CoDE
Public
[ECCV'24] Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities
deepfake-detection global-local
Python
•
MIT License
•0•44•0•0•Updated Jul 2, 2025Jul 2, 2025
MaPeT
Public
Learning to Mask and Permute Visual Tokens for Vision Transformer Pre-Training
Python
•1•16•2•0•Updated Jul 1, 2025Jul 1, 2025
synthcap_pp
Public
Official implementation of "Augmenting and Mixing Transformers with Synthetic Data for Image Captioning"
Python
•
Apache License 2.0
•0•0•0•0•Updated Jun 22, 2025Jun 22, 2025
mammoth-lite
Public
Python
•0•5•0•0•Updated Jun 13, 2025Jun 13, 2025
Sanctuaria-Gaze
Public
Sanctuaria-Gaze is a multimodal dataset of egocentric recordings from visits to four sanctuaries in Northern Italy. Alongside the data, we release an open-source framework for automatic detection and analysis of Areas of Interest (AOIs), enabling gaze-based research in dynamic, real-world settings without manual annotation.
human-visual-attention egocentric-vision gaze-analysis
Python
•
MIT License
•0•1•0•0•Updated Jun 10, 2025Jun 10, 2025
FourBi
Public
Binarizing Documents by Leveraging both Space and Frequency. (ICDAR 2024)
binarization document-binarization fast-fourier-convolution
Python
•3•13•0•0•Updated May 15, 2025May 15, 2025
awesome-human-visual-attention
Public
This repository contains a curated list of research papers and resources focusing on saliency and scanpath prediction, human attention, human visual search.
3•56•0•0•Updated May 9, 2025May 9, 2025
COGT
Public
[ICLR 2025] Causal Graphical Models for Vision-Language Compositional Understanding
Python
•0•9•0•0•Updated Apr 15, 2025Apr 15, 2025
HySAC
Public
Hyperbolic Safety-Aware Vision-Language Models. CVPR 2025
Python
•0•24•1•0•Updated Apr 8, 2025Apr 8, 2025
itserr-wp8-latin-embeddings
Public
ITSERR WP8 - Code for Latin embeddings semantic search
information-retrieval latin embeddings
Python
•
Apache License 2.0
•0•1•0•0•Updated Apr 1, 2025Apr 1, 2025