Thean Chun Koh, Chai Kiat Yeo, Xuan Jing, Sunil Sivadas
{"title":"Towards efficient video-based action recognition: context-aware memory attention network","authors":"Thean Chun Koh, Chai Kiat Yeo, Xuan Jing, Sunil Sivadas","doi":"10.1007/s42452-023-05568-5","DOIUrl":null,"url":null,"abstract":"Abstract Given the prevalence of surveillance cameras in our daily lives, human action recognition from videos holds significant practical applications. A persistent challenge in this field is to develop more efficient models capable of real-time recognition with high accuracy for widespread implementation. In this research paper, we introduce a novel human action recognition model named Context-Aware Memory Attention Network (CAMA-Net), which eliminates the need for optical flow extraction and 3D convolution which are computationally intensive. By removing these components, CAMA-Net achieves superior efficiency compared to many existing approaches in terms of computation efficiency. A pivotal component of CAMA-Net is the Context-Aware Memory Attention Module, an attention module that computes the relevance score between key-value pairs obtained from the 2D ResNet backbone. This process establishes correspondences between video frames. To validate our method, we conduct experiments on four well-known action recognition datasets: ActivityNet, Diving48, HMDB51 and UCF101. The experimental results convincingly demonstrate the effectiveness of our proposed model, surpassing the performance of existing 2D-CNN based baseline models. Article Highlights Recent human action recognition models are not yet ready for practical applications due to high computation needs. We propose a 2D CNN-based human action recognition method to reduce the computation load. The proposed method achieves competitive performance compared to most SOTA 2D CNN-based methods on public datasets.","PeriodicalId":21821,"journal":{"name":"SN Applied Sciences","volume":"24 21","pages":"0"},"PeriodicalIF":2.8000,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"SN Applied Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s42452-023-05568-5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Abstract Given the prevalence of surveillance cameras in our daily lives, human action recognition from videos holds significant practical applications. A persistent challenge in this field is to develop more efficient models capable of real-time recognition with high accuracy for widespread implementation. In this research paper, we introduce a novel human action recognition model named Context-Aware Memory Attention Network (CAMA-Net), which eliminates the need for optical flow extraction and 3D convolution which are computationally intensive. By removing these components, CAMA-Net achieves superior efficiency compared to many existing approaches in terms of computation efficiency. A pivotal component of CAMA-Net is the Context-Aware Memory Attention Module, an attention module that computes the relevance score between key-value pairs obtained from the 2D ResNet backbone. This process establishes correspondences between video frames. To validate our method, we conduct experiments on four well-known action recognition datasets: ActivityNet, Diving48, HMDB51 and UCF101. The experimental results convincingly demonstrate the effectiveness of our proposed model, surpassing the performance of existing 2D-CNN based baseline models. Article Highlights Recent human action recognition models are not yet ready for practical applications due to high computation needs. We propose a 2D CNN-based human action recognition method to reduce the computation load. The proposed method achieves competitive performance compared to most SOTA 2D CNN-based methods on public datasets.