{"title":"Behavior recognition algorithm based on motion capture and enhancement","authors":"Yuqi Yang, Jianping Luo","doi":"10.1117/12.2689663","DOIUrl":null,"url":null,"abstract":"Motion modeling and temporal modeling are crucial issues for video behavior recognition. When extracting motion information in two-stream network, the optical flow diagram needs to be calculated in advance and the end-to-end training cannot be realized. 3D CNNs can extract spatiotemporal information, but it requires huge computational resources. To solve these problems, we propose a plug-and-play motion capture and enhancement network (MCE) in this paper, which consists of a temporal motion capture module (TMC) and a multi-scale spatiotemporal enhancement module (MSTE). The TMC module calculates the temporal difference of the feature-level and captures the key motion information in the short temporal range. The MSTE module simulates long-range temporal information by equivalent enlarging the temporal sensitive field through multi-scale hierarchical sub-convolution architecture, and then further enhances the significant motion features by referring to the maxpooling branch. Finally, several experiments are carried out on the behavior recognition standard datasets of Something-Something-V1 and Jester, and the recognition accuracy rates are 49.6% and 96.9%, respectively. Experimental results show that the proposed method is effective and efficient.","PeriodicalId":118234,"journal":{"name":"4th International Conference on Information Science, Electrical and Automation Engineering","volume":"172 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"4th International Conference on Information Science, Electrical and Automation Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2689663","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Motion modeling and temporal modeling are crucial issues for video behavior recognition. When extracting motion information in two-stream network, the optical flow diagram needs to be calculated in advance and the end-to-end training cannot be realized. 3D CNNs can extract spatiotemporal information, but it requires huge computational resources. To solve these problems, we propose a plug-and-play motion capture and enhancement network (MCE) in this paper, which consists of a temporal motion capture module (TMC) and a multi-scale spatiotemporal enhancement module (MSTE). The TMC module calculates the temporal difference of the feature-level and captures the key motion information in the short temporal range. The MSTE module simulates long-range temporal information by equivalent enlarging the temporal sensitive field through multi-scale hierarchical sub-convolution architecture, and then further enhances the significant motion features by referring to the maxpooling branch. Finally, several experiments are carried out on the behavior recognition standard datasets of Something-Something-V1 and Jester, and the recognition accuracy rates are 49.6% and 96.9%, respectively. Experimental results show that the proposed method is effective and efficient.