Tangkfan / Awesome-Temporal-Video-Grounding Public

Notifications You must be signed in to change notification settings
Fork 0
Star 22

paper list on Video Moment Retrieval (VMR), or Temporal Video Grounding (TVG), Video Grounding (VG), or Temporal Sentence Grounding in Videos (TSGV)

22 stars 0 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
README.md		README.md

Repository files navigation

Awesome-Temporal-Video-Grounding

List of Temporal Video Grounding (TVG) papers.

The task is also usually referred to as:

Temporal Sentence Grounding (TSG)
Video Moment Retrieval (VMR)
Temporal Activity Localization via Language Query (TALL)

TVG was initially introduced in 2017 as a novel task designed to localize specific moments in videos that are semantically relevant to given natural language queries. Recent studies have started investigating techniques to augment the grounding capacity of large language models (LLMs), enabling them to better comprehend and temporally align visual information with natural language inputs.

Content

Awesome-Temporal-Video-Grounding
Content
1 Survey
2 Datasets
3 LLM for TVG
- 2023
- 2024
- 2025
4 Traditional TVG

1 Survey

[TPAMI'23] Temporal Sentence Grounding in Videos: A Survey and Future Directions. NTU 孙爱欣团队
[ACM Comput. Surv.'23] A Survey on Video Moment Localization. 哈工大聂礼强团队

2 Datasets

3 LLM for TVG

2023

2024

2025

4 Traditional TVG

2017

首次提出TSG任务。

Fully Supervised

Proposal-based

[ICCV'17] TALL: Temporal Activity Localization via Language Query. 南加大高继扬 [code]
[ICCV'17] Localizing Moments in Video with Natural Language. 伯克利 Lisa Anne Hendricks [code]

2018

Fully Supervised

Proposal-based

[EMNLP'18] Temporally Grounding Natural Sentence in Video. NUS Tat-Seng Chua团队
[IJCAI'18] Multi-modal Circulant Fusion for Video-to-Language and Backward. 天大韩亚洪团队
[ACM MM'18] Cross-modal Moment Localization in Videos. 山东大学聂礼强团队 [code]
[SIGIR'18] Attentive Moment Retrieval in Videos. 山东大学聂礼强团队 [code]

Proposal-free

[AAAI'19] Localizing Natural Language in Videos. 腾讯AI lab

Weakly Supervised

Reconstruction-based

[NeurIPS'18] Weakly Supervised Dense Event Captioning in Videos. 清华朱文武团队 [code]
- 首次提出弱监督密集事件描述，在训练中涉及到了TSG问题

2019

Fully Supervised

Proposal-based

[AAAI'19] Semantic Proposal for Activity Localization in Videos via Sentence Query. 复旦姜育刚团队
[CVPR'19] MAN: Moment Alignment Network for Natural Language Moment Retrieval via Iterative Graph Adjustment. UCSB Da Zhang
[ACM MM'19] Exploiting Temporal Relationships in Video Moment Localization with Natural Language. UR 罗杰波团队 [code]
[NeurIPS'19] Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos. 清华朱文武团队 [code]
[SIGIR'19] Cross-Modal Interaction Networks for Query-Based Moment Retrieval in Videos. 浙大赵洲团队 [code]
[WACV'19] MAC: Mining Activity Concepts for Language-based Temporal Localization. 南加大 [code]

Proposal-free

[AAAI'19] Multilevel Language and Vision Integration for Text-to-Clip Retrieval. BU Huijuan Xu [code]
[AAAI'19] To Find Where You Talk: Temporal Sentence Localization in Video with Attention Based Location Regression. 清华朱文武团队 [code]
[EMNLP'19] DEBUG: A Dense Bottom-Up Grounding Approach for Natural Language Video Localization. 浙大肖俊团队

RL-based

[AAAI'19] Read, Watch, and Move: Reinforcement Learning for Temporally Grounding Natural Language Descriptions in Videos. 百度
[CVPR'19] Language-Driven Temporal Activity Localization: A Semantic Matching Reinforcement Learning Model. 中科院王亮团队

Weakly Supervised

MIL-based

[CVPR'19] Weakly Supervised Video Moment Retrieval From Text Queries. UCR Amit K. Roy-Chowdhury团队 [code]
- 正式提出weakly supervised temporal sentence grounding任务。
[EMNLP'19] WSLLN:Weakly Supervised Natural Language Localization Networks. Salesforce

2020

Fully Supervised

Proposal-based

[AAAI'20] Learning 2D Temporal Adjacent Networks for Moment Localization with Natural Language. UR 罗杰波团队 [code]
- 首次提出2D map的方法，后面proposal-based的论文大多都是基于这个方法。

Proposal-free

[ACL'20] Span-based Localizing Network for Natural Language Video Localization. NTU 孙爱欣团队 [code]

Weakly Supervised

Reconstruction-based

[AAAI'20] Weakly-Supervised Video Moment Retrieval via Semantic Completion Network. 浙大赵洲团队 [code]
- 首次在WTSG任务中使用掩码重建的方法。

2021

Fully Supervised

Proposal-based

[SIGIR'21] Deconfounded Video Moment Retrieval with Causal Intervention. NUS Tat-Seng Chua 团队 [code]
- 将因果推理引入TSG，消除视频中的位置信息带来的偏差
[CVPR'21] Interventional Video Grounding with Dual Contrastive Learning. 北邮南国顺
- Contrastive learning + causal intervention
[CVPR'21] Multi-Modal Relational Graph for Cross-Modal Video Moment Retrieval. 湖南大学曹达团队
[ICCV'21] Fast Video Moment Retrieval. 中科院徐常胜团队

Proposal-free

[TPAMI'21] Natural Language Video Localization: A Revisit in Span-Based Question Answering Framework. NTU 孙爱欣团队
- VSLNet (ACL'20)的扩展版
[TMM'21] Frame-Wise Cross-Modal Matching for Video Moment Retrieval. 齐鲁工业大学程志勇团队 [code]

DETR-based

[NeurIPS'21] QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries. UNC 雷杰 [code]
- 将MR和HD任务联合，首次将DETR引入VMR领域。

Zero-Shot

首次提出无监督任务。

[ICCV'21] Zero-shot Natural Language Video Localization. 首尔大学 Jonghyun Choi团队 [code]
[TCSVT'21] Learning Video Moment Retrieval Without a Single Annotated Video. 中科院徐常胜团队

2022

Fully Supervised

Proposal-based

[SIGIR'22] You Need to Read Again: Multi-granularity Perception Network for Moment Retrieval in Videos. 上交周曦团队 [code]
[TCSVT'22] Efficient Video Grounding With Which-Where Reading Comprehension. 上交周曦团队

Proposal-free

[TIP'22] HiSA: Hierarchically Semantic Associating for Video Temporal Grounding. 西电邓成团队 [code]

DETR-based

[CVPR'22] UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection. 腾讯ARC lab [code]

Weakly Supervised

Reconstruction-based

[AAAI'22] Weakly Supervised Video Moment Localization with Contrastive Negative Sample Mining. 北大刘洋团队 [code]
[CVPR'22] Weakly Supervised Temporal Sentence Grounding with Gaussian-based Contrastive Proposal Learning. 北大刘洋团队 [code]
- 挖掘负样本信息，以更好地区分同一视频中极易混淆的场景。
- 后续的弱监督方法都是以CPL为baseline做的了。

Point-supervised/Glance

首次提出单帧监督任务。

[TMM'22] Point-Supervised Video Temporal Grounding. 西电邓成团队
[SIGIR'22] Video Moment Retrieval from Text Queries via Single Frame Annotation. 复旦姜育刚团队 [code]

2023

Fully Supervised

Proposal-based

[AAAI'23] Phrase-Level Temporal Relationship Mining for Temporal Sentence Localization. 北大刘洋团队 [code]
[ICCV'23] G2L: Semantically Aligned and Uniform Video Grounding via Geodesic and Game Theory. 北大邹月娴团队

Proposal-free

DETR-based

[ACL'23] MS-DETR: Natural Language Video Localization with Sampling Moment-Moment Interaction. NTU 孙爱欣团队 [code]
[CVPR'23] Query-Dependent Video Representation for Moment Retrieval and Highlight Detection. 成均馆大学 Jae-Pil Heo团队 [code]
[ICCV'23] Knowing Where to Focus: Event-aware Transformer for Video Grounding. 延世大学 Kwanghoon Sohn团队 [code]
[NeurIPS'23] MomentDiff: Generative Video Moment Retrieval from Random to Real. 中科大谢洪涛团队 [code]
- 利用diffusion的思想去噪生成预测时刻

Bias

[AAAI'23] Curriculum Multi-Negative Augmentation for Debiased Video Grounding. 清华朱文武团队

Weakly Supervised

Reconstruction-based

[CVPR'23] Weakly Supervised Temporal Sentence Grounding with Uncertainty-Guided Self-training. 东京大学 Yoichi Sato团队
[CVPR'23] Iterative Proposal Refinement for Weakly-Supervised Video Grounding. 北大邹月娴团队
[ICCV'23] SCANet: Scene Complexity Aware Network for Weakly-Supervised Video Moment Retrieval. 韩国科学技术院 Chang D. Yoo团队

Point-supervised/Glance

[ICCV'23] D3G: Exploring Gaussian Prior for Temporal Sentence Grounding with Glance Annotation. 腾讯优图 [code]

Zero-Shot

[ACL'23] Generating Structured Pseudo Labels for Noise-resistant Zero-shot Video Sentence Localization. 北大刘洋团队 [code]

2024

Fully Supervised

Proposal-based

[ACM MM'24] Maskable Retentive Network for Video Moment Retrieval. 合工大汪萌团队 [code]
[AAAI'24] Exploiting Auxiliary Caption for Video Grounding. 北大邹月娴团队

Proposal-free

DETR-based

[AAAI'24] Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval. 中科大谢洪涛团队 [code]
- 针对模态不平衡问题
[AAAI'24] TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight Detection. 华中师范谢伟团队 [code]
[CVPR'24] Task-Driven Exploration: Decoupling and Inter-Task Feedback for Joint Moment Retrieval and Highlight Detection. 西交魏平团队 [code]
[CVPR'24] Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection. 清华李秀团队 [code]
[ACM MM'24] Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval. 港浸大魏骁勇团队 [code]

Bias

[AAAI'24] Bias-Conflict Sample Synthesis and Adversarial Removal Debias Strategy for Temporal Sentence Grounding in Video. 哈工大张维刚团队 [code]

Weakly Supervised

Reconstruction-based

[AAAI'24] Gaussian Mixture Proposals with Pull-Push Learning Scheme to Capture Diverse Events for Weakly Supervised Temporal Video Grounding. 首尔大学 Jin Young Choi团队 [code]
[AAAI'24] Omnipotent Distillation with LLMs for Weakly-Supervised Natural Language Video Localization: When Divergence Meets Consistency. NTU Alex C. Kot团队
[PR'24] Triadic temporal-semantic alignment for weakly-supervised video moment retrieval. 山东大学周风余团队
[ACL'24] Exploiting Intrinsic Multilateral Logical Rules for Weakly Supervised Natural Language Video Localization. 西电邓成团队

About

paper list on Video Moment Retrieval (VMR), or Temporal Video Grounding (TVG), Video Grounding (VG), or Temporal Sentence Grounding in Videos (TSGV)

survey video-grounding video-moment-retrieval llm vllm temporal-video-grounding

Report repository

Releases

No releases published

Packages

No packages published