Jue Wang, Huanzhang Lu, Yao Zhang, Feng Ma, Moufa Hu
{"title":"Temporal Factorized Bilinear Modules with 2D CNN for Action Recognition in Videos","authors":"Jue Wang, Huanzhang Lu, Yao Zhang, Feng Ma, Moufa Hu","doi":"10.1109/icccs55155.2022.9846526","DOIUrl":null,"url":null,"abstract":"Action recognition is to automatically detect and classify human’s action in videos, with difficulty lies in modeling temporal relationship between frame sequences. The well-used 2D convolution neural network (CNN) is not suitable for this work, due to lacking temporal modeling ability. In this paper, a novel 2D CNN with inter frame information extraction module based on bilinear operation is proposed to deal with this problem. This model can greatly improve the temporal modeling ability of 2D CNN and just introduce a small amount of storage and calculation via parameter decomposition method. In addition, it has a flexible form to easily make tradeoff between performance and complexity. Finally, the effectiveness of this new network is validated on two kinds of benchmarks including both temporal-related (Something-Something v1) and scene-related(mini-kinetics), with top-1 accuracy 44.5% and 67.8% respectively, which reach or exceed the performance of existing methods with the similar model complexity.","PeriodicalId":121713,"journal":{"name":"2022 7th International Conference on Computer and Communication Systems (ICCCS)","volume":"331 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 7th International Conference on Computer and Communication Systems (ICCCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/icccs55155.2022.9846526","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Action recognition is to automatically detect and classify human’s action in videos, with difficulty lies in modeling temporal relationship between frame sequences. The well-used 2D convolution neural network (CNN) is not suitable for this work, due to lacking temporal modeling ability. In this paper, a novel 2D CNN with inter frame information extraction module based on bilinear operation is proposed to deal with this problem. This model can greatly improve the temporal modeling ability of 2D CNN and just introduce a small amount of storage and calculation via parameter decomposition method. In addition, it has a flexible form to easily make tradeoff between performance and complexity. Finally, the effectiveness of this new network is validated on two kinds of benchmarks including both temporal-related (Something-Something v1) and scene-related(mini-kinetics), with top-1 accuracy 44.5% and 67.8% respectively, which reach or exceed the performance of existing methods with the similar model complexity.