{"title":"SelfGCN:基于骨架的动作识别图卷积网络(Graph Convolution Network with Self-Attention for Skeleton-based Action Recognition)。","authors":"Zhize Wu;Pengpeng Sun;Xin Chen;Keke Tang;Tong Xu;Le Zou;Xiaofeng Wang;Ming Tan;Fan Cheng;Thomas Weise","doi":"10.1109/TIP.2024.3433581","DOIUrl":null,"url":null,"abstract":"Graph Convolutional Networks (GCNs) are widely used for skeleton-based action recognition and achieved remarkable performance. Due to the locality of graph convolution, GCNs can only utilize short-range node dependencies but fail to model long-range node relationships. In addition, existing graph convolution based methods normally use a uniform skeleton topology for all frames, which limits the ability of feature learning. To address these issues, we present the Graph Convolution Network with Self-Attention (SelfGCN), which consists of a mixing features across self-attention and graph convolution (MFSG) module and a temporal-specific spatial self-attention (TSSA) module. The MFSG module models local and global relationships between joints by executing graph convolution and self-attention branches in parallel. Its bi-directional interactive learning strategy utilizes complementary clues in the channel dimensions and the spatial dimensions across both of these branches. The TSSA module uses self-attention to learn the spatial relationships between joints of each frame in a skeleton sequence. It also models the unique spatial features of the single frames. We conduct extensive experiments on three popular benchmark datasets, NTU RGB+D, NTU RGB+D120, and Northwestern-UCLA. The results of the experiment demonstrate that our method achieves or exceeds the record accuracies on all three benchmarks. Our project website is available at \n<uri>https://github.com/SunPengP/SelfGCN</uri>\n.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SelfGCN: Graph Convolution Network With Self-Attention for Skeleton-Based Action Recognition\",\"authors\":\"Zhize Wu;Pengpeng Sun;Xin Chen;Keke Tang;Tong Xu;Le Zou;Xiaofeng Wang;Ming Tan;Fan Cheng;Thomas Weise\",\"doi\":\"10.1109/TIP.2024.3433581\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Graph Convolutional Networks (GCNs) are widely used for skeleton-based action recognition and achieved remarkable performance. Due to the locality of graph convolution, GCNs can only utilize short-range node dependencies but fail to model long-range node relationships. In addition, existing graph convolution based methods normally use a uniform skeleton topology for all frames, which limits the ability of feature learning. To address these issues, we present the Graph Convolution Network with Self-Attention (SelfGCN), which consists of a mixing features across self-attention and graph convolution (MFSG) module and a temporal-specific spatial self-attention (TSSA) module. The MFSG module models local and global relationships between joints by executing graph convolution and self-attention branches in parallel. Its bi-directional interactive learning strategy utilizes complementary clues in the channel dimensions and the spatial dimensions across both of these branches. The TSSA module uses self-attention to learn the spatial relationships between joints of each frame in a skeleton sequence. It also models the unique spatial features of the single frames. We conduct extensive experiments on three popular benchmark datasets, NTU RGB+D, NTU RGB+D120, and Northwestern-UCLA. The results of the experiment demonstrate that our method achieves or exceeds the record accuracies on all three benchmarks. Our project website is available at \\n<uri>https://github.com/SunPengP/SelfGCN</uri>\\n.\",\"PeriodicalId\":94032,\"journal\":{\"name\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10618962/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10618962/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
SelfGCN: Graph Convolution Network With Self-Attention for Skeleton-Based Action Recognition
Graph Convolutional Networks (GCNs) are widely used for skeleton-based action recognition and achieved remarkable performance. Due to the locality of graph convolution, GCNs can only utilize short-range node dependencies but fail to model long-range node relationships. In addition, existing graph convolution based methods normally use a uniform skeleton topology for all frames, which limits the ability of feature learning. To address these issues, we present the Graph Convolution Network with Self-Attention (SelfGCN), which consists of a mixing features across self-attention and graph convolution (MFSG) module and a temporal-specific spatial self-attention (TSSA) module. The MFSG module models local and global relationships between joints by executing graph convolution and self-attention branches in parallel. Its bi-directional interactive learning strategy utilizes complementary clues in the channel dimensions and the spatial dimensions across both of these branches. The TSSA module uses self-attention to learn the spatial relationships between joints of each frame in a skeleton sequence. It also models the unique spatial features of the single frames. We conduct extensive experiments on three popular benchmark datasets, NTU RGB+D, NTU RGB+D120, and Northwestern-UCLA. The results of the experiment demonstrate that our method achieves or exceeds the record accuracies on all three benchmarks. Our project website is available at
https://github.com/SunPengP/SelfGCN
.