GateHUB: Gated History Unit with Background Suppression for Online Action Detection

Junwen Chen, Gaurav Mittal, Ye Yu, Yu Kong, Mei Chen
{"title":"GateHUB: Gated History Unit with Background Suppression for Online Action Detection","authors":"Junwen Chen, Gaurav Mittal, Ye Yu, Yu Kong, Mei Chen","doi":"10.1109/CVPR52688.2022.01930","DOIUrl":null,"url":null,"abstract":"Online action detection is the task of predicting the action as soon as it happens in a streaming video. A major challenge is that the model does not have access to the future and has to solely rely on the history, i.e., the frames observed so far, to make predictions. It is therefore important to accentuate parts of the history that are more informative to the prediction of the current frame. We present GateHUB, Gated History Unit with Background Suppression, that comprises a novel position-guided gated cross attention mechanism to enhance or suppress parts of the history as per how informative they are for current frame prediction. GateHUB further proposes Future-augmented History (FaH) to make history features more informative by using subsequently observed frames when available. In a single unified framework, GateHUB integrates the transformer's ability of long-range temporal modeling and the recurrent model's capacity to selectively encode relevant information. GateHUB also introduces a background suppression objective to further mitigate false positive background frames that closely resemble the action frames. Extensive validation on three benchmark datasets, THUMOS, TVSeries, and HDD, demonstrates that GateHUB significantly outperforms all existing methods and is also more efficient than the existing best work. Furthermore, a flow free version of GateHUB is able to achieve higher or close accuracy at 2.8× higher frame rate compared to all existing methods that require both RGB and optical flow information for prediction.","PeriodicalId":355552,"journal":{"name":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR52688.2022.01930","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

Abstract

Online action detection is the task of predicting the action as soon as it happens in a streaming video. A major challenge is that the model does not have access to the future and has to solely rely on the history, i.e., the frames observed so far, to make predictions. It is therefore important to accentuate parts of the history that are more informative to the prediction of the current frame. We present GateHUB, Gated History Unit with Background Suppression, that comprises a novel position-guided gated cross attention mechanism to enhance or suppress parts of the history as per how informative they are for current frame prediction. GateHUB further proposes Future-augmented History (FaH) to make history features more informative by using subsequently observed frames when available. In a single unified framework, GateHUB integrates the transformer's ability of long-range temporal modeling and the recurrent model's capacity to selectively encode relevant information. GateHUB also introduces a background suppression objective to further mitigate false positive background frames that closely resemble the action frames. Extensive validation on three benchmark datasets, THUMOS, TVSeries, and HDD, demonstrates that GateHUB significantly outperforms all existing methods and is also more efficient than the existing best work. Furthermore, a flow free version of GateHUB is able to achieve higher or close accuracy at 2.8× higher frame rate compared to all existing methods that require both RGB and optical flow information for prediction.
GateHUB:用于在线动作检测的带有背景抑制的门控历史单元
在线动作检测是在流媒体视频中发生动作时立即预测动作的任务。一个主要的挑战是,该模型无法访问未来,必须完全依赖历史,即迄今为止观察到的框架,以做出预测。因此,重要的是要强调历史上对当前框架的预测提供更多信息的部分。我们提出了GateHUB,带背景抑制的门控历史单元,它包括一种新的位置引导门控交叉注意机制,根据它们对当前帧预测的信息量来增强或抑制部分历史。GateHUB进一步提出了未来增强历史(FaH),通过在可用时使用随后观察到的帧,使历史特征更具信息性。在一个单一的统一框架中,GateHUB集成了变压器的远程时间建模能力和循环模型的选择性编码相关信息的能力。GateHUB还引入了一个背景抑制目标,以进一步减少与动作帧非常相似的假阳性背景帧。在三个基准数据集(THUMOS、TVSeries和HDD)上进行的广泛验证表明,GateHUB显著优于所有现有方法,也比现有的最佳方法更有效。此外,与所有需要RGB和光流信息进行预测的现有方法相比,GateHUB的无流版本能够以2.8倍高的帧速率实现更高或接近的精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信