弱监督时空动作定位的自适应原型学习

Wang Luo, Huan Ren, Tianzhu Zhangd, Wenfei Yang, Yongdong Zhang
{"title":"弱监督时空动作定位的自适应原型学习","authors":"Wang Luo, Huan Ren, Tianzhu Zhangd, Wenfei Yang, Yongdong Zhang","doi":"10.1109/TIP.2024.3431915","DOIUrl":null,"url":null,"abstract":"<p><p>Weakly-supervised Temporal Action Localization (WTAL) aims to localize action instances with only video-level labels during training, where two primary issues are localization incompleteness and background interference. To relieve these two issues, recent methods adopt an attention mechanism to activate action instances and simultaneously suppress background ones, which have achieved remarkable progress. Nevertheless, we argue that these two issues have not been well resolved yet. On the one hand, the attention mechanism adopts fixed weights for different videos, which are incapable of handling the diversity of different videos, thus deficient in addressing the problem of localization incompleteness. On the other hand, previous methods only focus on learning the foreground attention and the attention weights usually suffer from ambiguity, resulting in difficulty of suppressing background interference. To deal with the above issues, in this paper we propose an Adaptive Prototype Learning (APL) method for WTAL, which includes two key designs: (1) an Adaptive Transformer Network (ATN) to explicitly model background and learn video-adaptive prototypes for each specific video, (2) an OT-based Collaborative (OTC) training strategy to guide the learning of prototypes and remove the ambiguity of the foreground-background separation by introducing an Optimal Transport (OT) algorithm into the collaborative training scheme between RGB and FLOW streams. These two key designs can work together to learn video-adaptive prototypes and solve the above two issues, achieving robust localization. Extensive experimental results on two standard benchmarks (THUMOS14 and ActivityNet) demonstrate that our proposed APL performs favorably against state-of-the-art methods.</p>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Adaptive Prototype Learning for Weakly-supervised Temporal Action Localization.\",\"authors\":\"Wang Luo, Huan Ren, Tianzhu Zhangd, Wenfei Yang, Yongdong Zhang\",\"doi\":\"10.1109/TIP.2024.3431915\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Weakly-supervised Temporal Action Localization (WTAL) aims to localize action instances with only video-level labels during training, where two primary issues are localization incompleteness and background interference. To relieve these two issues, recent methods adopt an attention mechanism to activate action instances and simultaneously suppress background ones, which have achieved remarkable progress. Nevertheless, we argue that these two issues have not been well resolved yet. On the one hand, the attention mechanism adopts fixed weights for different videos, which are incapable of handling the diversity of different videos, thus deficient in addressing the problem of localization incompleteness. On the other hand, previous methods only focus on learning the foreground attention and the attention weights usually suffer from ambiguity, resulting in difficulty of suppressing background interference. To deal with the above issues, in this paper we propose an Adaptive Prototype Learning (APL) method for WTAL, which includes two key designs: (1) an Adaptive Transformer Network (ATN) to explicitly model background and learn video-adaptive prototypes for each specific video, (2) an OT-based Collaborative (OTC) training strategy to guide the learning of prototypes and remove the ambiguity of the foreground-background separation by introducing an Optimal Transport (OT) algorithm into the collaborative training scheme between RGB and FLOW streams. These two key designs can work together to learn video-adaptive prototypes and solve the above two issues, achieving robust localization. Extensive experimental results on two standard benchmarks (THUMOS14 and ActivityNet) demonstrate that our proposed APL performs favorably against state-of-the-art methods.</p>\",\"PeriodicalId\":94032,\"journal\":{\"name\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TIP.2024.3431915\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TIP.2024.3431915","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

弱监督时空动作定位(WTAL)的目的是在训练过程中仅使用视频级标签定位动作实例,其中两个主要问题是定位不完整和背景干扰。为了缓解这两个问题,最近的方法采用了一种注意力机制来激活动作实例,同时抑制背景实例,并取得了显著进展。然而,我们认为这两个问题还没有得到很好的解决。一方面,注意力机制对不同视频采用固定权重,无法处理不同视频的多样性,因此在解决定位不完整问题上存在缺陷。另一方面,以往的方法只关注前景注意力的学习,注意力权重通常存在模糊性,导致难以抑制背景干扰。针对上述问题,本文提出了一种适用于 WTAL 的自适应原型学习(APL)方法,其中包括两个关键设计:(1)自适应变换器网络(ATN),用于对背景进行显式建模,并针对每个特定视频学习视频自适应原型;(2)基于 OT 的协同(OTC)训练策略,通过在 RGB 和 FLOW 流之间的协同训练方案中引入最优传输(OT)算法,指导原型学习并消除前景-背景分离的模糊性。这两项关键设计可以共同学习视频自适应原型,解决上述两个问题,从而实现稳健的定位。在两个标准基准(THUMOS14 和 ActivityNet)上进行的大量实验结果表明,我们提出的 APL 与最先进的方法相比表现出色。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Adaptive Prototype Learning for Weakly-supervised Temporal Action Localization.

Weakly-supervised Temporal Action Localization (WTAL) aims to localize action instances with only video-level labels during training, where two primary issues are localization incompleteness and background interference. To relieve these two issues, recent methods adopt an attention mechanism to activate action instances and simultaneously suppress background ones, which have achieved remarkable progress. Nevertheless, we argue that these two issues have not been well resolved yet. On the one hand, the attention mechanism adopts fixed weights for different videos, which are incapable of handling the diversity of different videos, thus deficient in addressing the problem of localization incompleteness. On the other hand, previous methods only focus on learning the foreground attention and the attention weights usually suffer from ambiguity, resulting in difficulty of suppressing background interference. To deal with the above issues, in this paper we propose an Adaptive Prototype Learning (APL) method for WTAL, which includes two key designs: (1) an Adaptive Transformer Network (ATN) to explicitly model background and learn video-adaptive prototypes for each specific video, (2) an OT-based Collaborative (OTC) training strategy to guide the learning of prototypes and remove the ambiguity of the foreground-background separation by introducing an Optimal Transport (OT) algorithm into the collaborative training scheme between RGB and FLOW streams. These two key designs can work together to learn video-adaptive prototypes and solve the above two issues, achieving robust localization. Extensive experimental results on two standard benchmarks (THUMOS14 and ActivityNet) demonstrate that our proposed APL performs favorably against state-of-the-art methods.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信