[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
-
Updated
Jun 4, 2024 - Python
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
(AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions
A true multimodal LLaMA derivative -- on Discord!
Implementation of Qformer from BLIP2 in Zeta Lego blocks.
[ACM MM 2024] Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives
Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"
An offline AI-powered video analysis tool with object detection (YOLO), image captioning (BLIP), speech transcription (Whisper), audio event detection (PANNs), and AI-generated summaries (LLMs via Ollama). It ensures privacy and offline use with a user-friendly GUI.
The Multimodal Model for Vietnamese Visual Question Answering (ViVQA)
CLIP Interrogator, fully in HuggingFace Transformers 🤗, with LongCLIP & CLIP's own words and / or *your* own words!
Modifying LAVIS' BLIP2 Q-former with models pretrained on Japanese datasets.
Caption images across your datasets with state of the art models from Hugging Face and Replicate!
This repository is for profiling, extracting, visualizing and reusing generative AI weights to hopefully build more accurate AI models and audit/scan weights at rest to identify knowledge domains for risk(s).
caption generator using lavis and argostranslate
Skin cancer ranks among the most prevalent cancers globally, with early identification crucial for improving patient outcomes. This study presents a strategy for skin cancer detection using a fine-tuned BLIP-2 (Bootstrapped Language-Image Pre- training) model, optimized via Weight-Decomposed Low-Rank Adaptation (DoRA).
Too lazy to organize my desktop, make gpt + BLIP-2 do it /s
Implementation of Qformer pre-training
ContextVision is an AI-powered real-time scene understanding assistant that helps visually impaired individuals interpret their surroundings through live video analysis, speech interaction, and AI-driven insights
Add a description, image, and links to the blip2 topic page so that developers can more easily learn about it.
To associate your repository with the blip2 topic, visit your repo's landing page and select "manage topics."