Cheng Zhang, Jianqi Zhong, Wenming Cao, Jianhua Ji
{"title":"Multiple Distilling-based spatial-temporal attention networks for unsupervised human action recognition","authors":"Cheng Zhang, Jianqi Zhong, Wenming Cao, Jianhua Ji","doi":"10.3233/ida-230399","DOIUrl":null,"url":null,"abstract":"Unsupervised action recognition based on spatiotemporal fusion feature extraction has attracted much attention in recent years. However, existing methods still have several limitations: (1) The long-term dependence relationship is not effectively extracted at the time level. (2) The high-order motion relationship between non-adjacent nodes is not effectively captured at the spatial level. (3) The model complexity is too high when the cascade layer input sequence is long, or there are many key points. To solve these problems, a Multiple Distilling-based spatial-temporal attention (MD-STA) networks is proposed in this paper. This model can extract temporal and spatial features respectively and fuse them. Specifically, we first propose a Screening Self-attention (SSA) module; this module can find long-term dependencies in distant frames and high-order motion patterns between non-adjacent nodes in a single frame through a sparse metric on dot product pairs. Then, we propose the Frames and Keypoint-Distilling (FKD) module, which uses extraction operations to halve the input of the cascade layer to eliminate invalid key points and time frame features, thus reducing time and memory complexity. Finally, the Dim-reduction Fusion (DRF) module is proposed to reduce the dimension of existing features to further eliminate redundancy. Numerous experiments were conducted on three distinct datasets: NTU-60, NTU-120, and UWA3D, showing that MD-STA achieves state-of-the-art standards in skeleton-based unsupervised action recognition.","PeriodicalId":50355,"journal":{"name":"Intelligent Data Analysis","volume":"131 1","pages":"0"},"PeriodicalIF":0.9000,"publicationDate":"2023-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent Data Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/ida-230399","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Unsupervised action recognition based on spatiotemporal fusion feature extraction has attracted much attention in recent years. However, existing methods still have several limitations: (1) The long-term dependence relationship is not effectively extracted at the time level. (2) The high-order motion relationship between non-adjacent nodes is not effectively captured at the spatial level. (3) The model complexity is too high when the cascade layer input sequence is long, or there are many key points. To solve these problems, a Multiple Distilling-based spatial-temporal attention (MD-STA) networks is proposed in this paper. This model can extract temporal and spatial features respectively and fuse them. Specifically, we first propose a Screening Self-attention (SSA) module; this module can find long-term dependencies in distant frames and high-order motion patterns between non-adjacent nodes in a single frame through a sparse metric on dot product pairs. Then, we propose the Frames and Keypoint-Distilling (FKD) module, which uses extraction operations to halve the input of the cascade layer to eliminate invalid key points and time frame features, thus reducing time and memory complexity. Finally, the Dim-reduction Fusion (DRF) module is proposed to reduce the dimension of existing features to further eliminate redundancy. Numerous experiments were conducted on three distinct datasets: NTU-60, NTU-120, and UWA3D, showing that MD-STA achieves state-of-the-art standards in skeleton-based unsupervised action recognition.
期刊介绍:
Intelligent Data Analysis provides a forum for the examination of issues related to the research and applications of Artificial Intelligence techniques in data analysis across a variety of disciplines. These techniques include (but are not limited to): all areas of data visualization, data pre-processing (fusion, editing, transformation, filtering, sampling), data engineering, database mining techniques, tools and applications, use of domain knowledge in data analysis, big data applications, evolutionary algorithms, machine learning, neural nets, fuzzy logic, statistical pattern recognition, knowledge filtering, and post-processing. In particular, papers are preferred that discuss development of new AI related data analysis architectures, methodologies, and techniques and their applications to various domains.