{"title":"Non-Local Spatiotemporal Correlation Attention for Action Recognition","authors":"Manh-Hung Ha, O. Chen","doi":"10.1109/ICMEW56448.2022.9859314","DOIUrl":null,"url":null,"abstract":"To well perceive human actions, it may be favorable only to consider useful clues of human and scene context during the recognition process. Deep Neural Networks (DNNs) used to build up blocks associate with local neighborhood correlation computations at spatial and temporal domains individually. In this work, we develop a DNN which consists of a 3D convolutional neural network, Non-Local SpatioTemporal Correlation Attention (NSTCA) module, and classifier to retrieve meaningful semantic context for effective action identification. Particularly, the proposed NSTCA module extracts advantageous visual clues of both spatial and temporal features via transposed feature correlation computations rather than individual spatial and temporal attention computations. In the experiments, the dataset of traffic police was fulfilled for analysis and comparison. The experimental outcome exhibits that the proposed DNN obtains an average accuracy of 98.2% which is superior to those from the conventional DNNs. Therefore, the DNN proposed herein can be widely applied to discern various actions of subjects in video scenes.","PeriodicalId":106759,"journal":{"name":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMEW56448.2022.9859314","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
To well perceive human actions, it may be favorable only to consider useful clues of human and scene context during the recognition process. Deep Neural Networks (DNNs) used to build up blocks associate with local neighborhood correlation computations at spatial and temporal domains individually. In this work, we develop a DNN which consists of a 3D convolutional neural network, Non-Local SpatioTemporal Correlation Attention (NSTCA) module, and classifier to retrieve meaningful semantic context for effective action identification. Particularly, the proposed NSTCA module extracts advantageous visual clues of both spatial and temporal features via transposed feature correlation computations rather than individual spatial and temporal attention computations. In the experiments, the dataset of traffic police was fulfilled for analysis and comparison. The experimental outcome exhibits that the proposed DNN obtains an average accuracy of 98.2% which is superior to those from the conventional DNNs. Therefore, the DNN proposed herein can be widely applied to discern various actions of subjects in video scenes.