A joint model for action localization and classification in untrimmed video with visual attention

Weimian Li, Wenmin Wang, Xiongtao Chen, Jinzhuo Wang, Ge Li
{"title":"A joint model for action localization and classification in untrimmed video with visual attention","authors":"Weimian Li, Wenmin Wang, Xiongtao Chen, Jinzhuo Wang, Ge Li","doi":"10.1109/ICME.2017.8019335","DOIUrl":null,"url":null,"abstract":"In this paper, we introduce a joint model that learns to directly localize the temporal bounds of actions in untrimmed videos as well as precisely classify what actions occur. Most existing approaches tend to scan the whole video to generate action instances, which are really inefficient. Instead, inspired by human perception, our model is formulated based on a recurrent neural network to observe different locations within a video over time. And, it is capable of producing temporal localizations by only observing a fixed number of fragments, and the amount of computation it performs is independent of input video size. The decision policy for determining where to look next is learned by REINFORCE which is powerful in non-differentiable settings. In addition, different from relevant ways, our model runs localization and classification serially, and possesses a strategy for extracting appropriate features to classify. We evaluate our model on ActivityNet dataset, and it greatly outperforms the baseline. Moreover, compared with a recent approach, we show that our serial design can bring about 9% increase in detection performance.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"138 3","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Multimedia and Expo (ICME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICME.2017.8019335","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

In this paper, we introduce a joint model that learns to directly localize the temporal bounds of actions in untrimmed videos as well as precisely classify what actions occur. Most existing approaches tend to scan the whole video to generate action instances, which are really inefficient. Instead, inspired by human perception, our model is formulated based on a recurrent neural network to observe different locations within a video over time. And, it is capable of producing temporal localizations by only observing a fixed number of fragments, and the amount of computation it performs is independent of input video size. The decision policy for determining where to look next is learned by REINFORCE which is powerful in non-differentiable settings. In addition, different from relevant ways, our model runs localization and classification serially, and possesses a strategy for extracting appropriate features to classify. We evaluate our model on ActivityNet dataset, and it greatly outperforms the baseline. Moreover, compared with a recent approach, we show that our serial design can bring about 9% increase in detection performance.
带有视觉注意的未修剪视频动作定位与分类联合模型
在本文中,我们引入了一个联合模型,该模型学习直接定位未修剪视频中动作的时间界限,并精确分类发生的动作。大多数现有的方法倾向于扫描整个视频来生成动作实例,这是非常低效的。相反,受人类感知的启发,我们的模型是基于循环神经网络制定的,以观察视频中随时间变化的不同位置。并且,它能够通过只观察固定数量的片段来产生时间定位,并且它执行的计算量与输入视频大小无关。决定下一步看哪里的决策策略是通过强化来学习的,这在不可微的环境中是很强大的。此外,与相关方法不同的是,我们的模型是连续进行定位和分类的,并具有提取合适特征进行分类的策略。我们在ActivityNet数据集上评估了我们的模型,它大大优于基线。此外,与最近的方法相比,我们表明我们的串行设计可以使检测性能提高9%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信