{"title":"Complementary Disentangling and Dynamic Graph Convolution for Skeleton Based Action Recognition","authors":"Kemin Shi, Lin Xu","doi":"10.1145/3603781.3603866","DOIUrl":null,"url":null,"abstract":"In this study, we introduce a novel skeleton-based action recognition approach, the Complementary Disentangling and Dynamic Graph Convolution Network (CDD-GCN). This method combines multi-scale graph convolution and multi-head self-attention to model human body structure and motion characteristics. We employ a complementary disentangle neighbor-hoods method to generate multi-scale graphs, which eliminates the redundant dependency on nearby nodes when receiving information from distant nodes while maximally preserving the structural features of the human skeleton. In accordance with the characteristics of human skeletal sequences, we improve the self-attention mechanism by introducing temporal pooling, semantic information, graph importance tuning matrix, and high-probability graph dropout into the dynamic graph generation process, achieving more effective action connections with lower computational complexity. We integrate the self-attention mechanism with the graph convolution process at the feature level, enabling independent learning and better performance of both, and modify the multi-head feature aggregation method of self-attention to be consistent with the graph convolution process, facilitating smoother subsequent fusion. Experimental results demonstrate that the CDD-GCN achieves the state-of-the-art performance on two large-scale datasets, NTU RGB+D 60 and 120, exemplified by a 92.7% accuracy on the cross-subject benchmark of NTU RGB+D 60, while maintaining low computational costs.","PeriodicalId":391180,"journal":{"name":"Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things","volume":"341 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3603781.3603866","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this study, we introduce a novel skeleton-based action recognition approach, the Complementary Disentangling and Dynamic Graph Convolution Network (CDD-GCN). This method combines multi-scale graph convolution and multi-head self-attention to model human body structure and motion characteristics. We employ a complementary disentangle neighbor-hoods method to generate multi-scale graphs, which eliminates the redundant dependency on nearby nodes when receiving information from distant nodes while maximally preserving the structural features of the human skeleton. In accordance with the characteristics of human skeletal sequences, we improve the self-attention mechanism by introducing temporal pooling, semantic information, graph importance tuning matrix, and high-probability graph dropout into the dynamic graph generation process, achieving more effective action connections with lower computational complexity. We integrate the self-attention mechanism with the graph convolution process at the feature level, enabling independent learning and better performance of both, and modify the multi-head feature aggregation method of self-attention to be consistent with the graph convolution process, facilitating smoother subsequent fusion. Experimental results demonstrate that the CDD-GCN achieves the state-of-the-art performance on two large-scale datasets, NTU RGB+D 60 and 120, exemplified by a 92.7% accuracy on the cross-subject benchmark of NTU RGB+D 60, while maintaining low computational costs.