Complementary Disentangling and Dynamic Graph Convolution for Skeleton Based Action Recognition

Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things Pub Date : 2023-05-26 DOI:10.1145/3603781.3603866

Kemin Shi, Lin Xu

{"title":"Complementary Disentangling and Dynamic Graph Convolution for Skeleton Based Action Recognition","authors":"Kemin Shi, Lin Xu","doi":"10.1145/3603781.3603866","DOIUrl":null,"url":null,"abstract":"In this study, we introduce a novel skeleton-based action recognition approach, the Complementary Disentangling and Dynamic Graph Convolution Network (CDD-GCN). This method combines multi-scale graph convolution and multi-head self-attention to model human body structure and motion characteristics. We employ a complementary disentangle neighbor-hoods method to generate multi-scale graphs, which eliminates the redundant dependency on nearby nodes when receiving information from distant nodes while maximally preserving the structural features of the human skeleton. In accordance with the characteristics of human skeletal sequences, we improve the self-attention mechanism by introducing temporal pooling, semantic information, graph importance tuning matrix, and high-probability graph dropout into the dynamic graph generation process, achieving more effective action connections with lower computational complexity. We integrate the self-attention mechanism with the graph convolution process at the feature level, enabling independent learning and better performance of both, and modify the multi-head feature aggregation method of self-attention to be consistent with the graph convolution process, facilitating smoother subsequent fusion. Experimental results demonstrate that the CDD-GCN achieves the state-of-the-art performance on two large-scale datasets, NTU RGB+D 60 and 120, exemplified by a 92.7% accuracy on the cross-subject benchmark of NTU RGB+D 60, while maintaining low computational costs.","PeriodicalId":391180,"journal":{"name":"Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things","volume":"341 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3603781.3603866","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In this study, we introduce a novel skeleton-based action recognition approach, the Complementary Disentangling and Dynamic Graph Convolution Network (CDD-GCN). This method combines multi-scale graph convolution and multi-head self-attention to model human body structure and motion characteristics. We employ a complementary disentangle neighbor-hoods method to generate multi-scale graphs, which eliminates the redundant dependency on nearby nodes when receiving information from distant nodes while maximally preserving the structural features of the human skeleton. In accordance with the characteristics of human skeletal sequences, we improve the self-attention mechanism by introducing temporal pooling, semantic information, graph importance tuning matrix, and high-probability graph dropout into the dynamic graph generation process, achieving more effective action connections with lower computational complexity. We integrate the self-attention mechanism with the graph convolution process at the feature level, enabling independent learning and better performance of both, and modify the multi-head feature aggregation method of self-attention to be consistent with the graph convolution process, facilitating smoother subsequent fusion. Experimental results demonstrate that the CDD-GCN achieves the state-of-the-art performance on two large-scale datasets, NTU RGB+D 60 and 120, exemplified by a 92.7% accuracy on the cross-subject benchmark of NTU RGB+D 60, while maintaining low computational costs.

查看原文本刊更多论文

基于骨架的动作识别的互补解缠和动态图卷积

在这项研究中，我们引入了一种新的基于骨架的动作识别方法，互补解纠缠和动态图卷积网络(CDD-GCN)。该方法结合多尺度图卷积和多头自关注对人体结构和运动特征进行建模。我们采用互补解纠缠邻域方法生成多尺度图，在最大程度上保留人体骨骼的结构特征的同时，消除了从远处节点接收信息时对附近节点的冗余依赖。根据人体骨骼序列的特点，在动态图生成过程中引入时间池、语义信息、图重要性调优矩阵和大概率图dropout，改进自注意机制，以更低的计算复杂度实现更有效的动作连接。我们将自注意机制与图卷积过程在特征层面进行了融合，使两者能够独立学习，性能更好，并对自注意的多头特征聚合方法进行了修改，使其与图卷积过程保持一致，便于后续融合更加顺畅。实验结果表明，CDD-GCN在NTU RGB+ d60和NTU RGB+ d60两个大规模数据集上取得了最先进的性能，在NTU RGB+ d60的交叉学科基准上准确率达到92.7%，同时保持了较低的计算成本。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things

自引率

0.00%

发文量