Temporal Factorized Bilinear Modules with 2D CNN for Action Recognition in Videos

Jue Wang, Huanzhang Lu, Yao Zhang, Feng Ma, Moufa Hu
{"title":"Temporal Factorized Bilinear Modules with 2D CNN for Action Recognition in Videos","authors":"Jue Wang, Huanzhang Lu, Yao Zhang, Feng Ma, Moufa Hu","doi":"10.1109/icccs55155.2022.9846526","DOIUrl":null,"url":null,"abstract":"Action recognition is to automatically detect and classify human’s action in videos, with difficulty lies in modeling temporal relationship between frame sequences. The well-used 2D convolution neural network (CNN) is not suitable for this work, due to lacking temporal modeling ability. In this paper, a novel 2D CNN with inter frame information extraction module based on bilinear operation is proposed to deal with this problem. This model can greatly improve the temporal modeling ability of 2D CNN and just introduce a small amount of storage and calculation via parameter decomposition method. In addition, it has a flexible form to easily make tradeoff between performance and complexity. Finally, the effectiveness of this new network is validated on two kinds of benchmarks including both temporal-related (Something-Something v1) and scene-related(mini-kinetics), with top-1 accuracy 44.5% and 67.8% respectively, which reach or exceed the performance of existing methods with the similar model complexity.","PeriodicalId":121713,"journal":{"name":"2022 7th International Conference on Computer and Communication Systems (ICCCS)","volume":"331 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 7th International Conference on Computer and Communication Systems (ICCCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/icccs55155.2022.9846526","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Action recognition is to automatically detect and classify human’s action in videos, with difficulty lies in modeling temporal relationship between frame sequences. The well-used 2D convolution neural network (CNN) is not suitable for this work, due to lacking temporal modeling ability. In this paper, a novel 2D CNN with inter frame information extraction module based on bilinear operation is proposed to deal with this problem. This model can greatly improve the temporal modeling ability of 2D CNN and just introduce a small amount of storage and calculation via parameter decomposition method. In addition, it has a flexible form to easily make tradeoff between performance and complexity. Finally, the effectiveness of this new network is validated on two kinds of benchmarks including both temporal-related (Something-Something v1) and scene-related(mini-kinetics), with top-1 accuracy 44.5% and 67.8% respectively, which reach or exceed the performance of existing methods with the similar model complexity.
基于二维CNN的时间分解双线性模块视频动作识别
动作识别是对视频中人的动作进行自动检测和分类,难点在于对帧序列之间的时间关系进行建模。常用的二维卷积神经网络(CNN)由于缺乏时间建模能力而不适合这项工作。为了解决这一问题,本文提出了一种基于双线性运算的具有帧间信息提取模块的二维CNN。该模型可以大大提高二维CNN的时间建模能力,通过参数分解方法只引入少量的存储和计算。此外,它具有灵活的形式,可以轻松地在性能和复杂性之间进行权衡。最后,在时间相关(Something-Something v1)和场景相关(mini-kinetics)两种基准上验证了该网络的有效性,top-1准确率分别为44.5%和67.8%,达到或超过了具有相似模型复杂度的现有方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信