Two Stream Dynamic Threshold Network for Weakly-Supervised Temporal Action Localization

2021 IEEE International Conference on Real-time Computing and Robotics (RCAR) Pub Date : 2021-07-15 DOI:10.1109/RCAR52367.2021.9517513

Hao Yan, Jun Cheng, Qieshi Zhang, Ziliang Ren, Shijie Sun, Qin Cheng

{"title":"Two Stream Dynamic Threshold Network for Weakly-Supervised Temporal Action Localization","authors":"Hao Yan, Jun Cheng, Qieshi Zhang, Ziliang Ren, Shijie Sun, Qin Cheng","doi":"10.1109/RCAR52367.2021.9517513","DOIUrl":null,"url":null,"abstract":"The current mainstream temporal action localization methods are fully-supervised, which needs a lot of time to annotate the required frame-level labels. The emergence of weakly-supervised methods greatly alleviates such problem, as they only require video-level labels to train the models. In order to generate accurate action localization boundary, recent two stream consensus network (TSCN) proposes an attention normalization loss to explicitly force the attention values approach extreme values to avoid ambiguity. However, most previous methods including TSCN use a fixed threshold applied on attention loss to polarize the attention values, which lacks flexibility for different videos. In this paper, we propose a Dynamic Threshold Weakly-supervised action Localization (DH-WTAL) method to address this problem. The proposed DH-WTAL features a dynamic attention threshold decision for the attention mechanism. Specifically, the dynamic threshold can dynamically control the number of snippets selected for different videos, which further adjust the extreme values of the attention mechanism for different videos accordingly. Extensive experiments demonstrate that the proposed DH-WTAL outperforms the TSCN baseline, and ablation study validates the effectiveness of this method.","PeriodicalId":232892,"journal":{"name":"2021 IEEE International Conference on Real-time Computing and Robotics (RCAR)","volume":"60 6-7","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Real-time Computing and Robotics (RCAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RCAR52367.2021.9517513","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The current mainstream temporal action localization methods are fully-supervised, which needs a lot of time to annotate the required frame-level labels. The emergence of weakly-supervised methods greatly alleviates such problem, as they only require video-level labels to train the models. In order to generate accurate action localization boundary, recent two stream consensus network (TSCN) proposes an attention normalization loss to explicitly force the attention values approach extreme values to avoid ambiguity. However, most previous methods including TSCN use a fixed threshold applied on attention loss to polarize the attention values, which lacks flexibility for different videos. In this paper, we propose a Dynamic Threshold Weakly-supervised action Localization (DH-WTAL) method to address this problem. The proposed DH-WTAL features a dynamic attention threshold decision for the attention mechanism. Specifically, the dynamic threshold can dynamically control the number of snippets selected for different videos, which further adjust the extreme values of the attention mechanism for different videos accordingly. Extensive experiments demonstrate that the proposed DH-WTAL outperforms the TSCN baseline, and ablation study validates the effectiveness of this method.

查看原文本刊更多论文

弱监督时间动作定位的双流动态阈值网络

目前主流的时间动作定位方法是全监督的，需要大量的时间来标注所需的帧级标签。弱监督方法的出现极大地缓解了这一问题，因为它们只需要视频级别的标签来训练模型。为了生成准确的动作定位边界，最近的两流共识网络(TSCN)提出了一种注意归一化损失，明确地迫使注意值接近极值，以避免歧义。然而，包括TSCN在内的以往方法大多采用固定的注意损失阈值来极化注意值，缺乏针对不同视频的灵活性。在本文中，我们提出了一种动态阈值弱监督动作定位(DH-WTAL)方法来解决这个问题。提出的DH-WTAL具有动态注意阈值决策的特点。具体来说，动态阈值可以动态控制不同视频选择的片段数，从而进一步调整不同视频关注机制的极值。大量实验表明，提出的DH-WTAL优于TSCN基线，烧蚀研究验证了该方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE International Conference on Real-time Computing and Robotics (RCAR)

自引率

0.00%

发文量