Yiran Zhu, Guangji Huang, Xing Xu, Yanli Ji, Fumin Shen
{"title":"基于骨架的动作识别的选择性超图卷积网络","authors":"Yiran Zhu, Guangji Huang, Xing Xu, Yanli Ji, Fumin Shen","doi":"10.1145/3512527.3531367","DOIUrl":null,"url":null,"abstract":"In skeleton-based action recognition, Graph Convolutional Networks (GCNs) have achieved remarkable performance since the skeleton representation of human action can be naturally modeled by the graph structure. Most of the existing GCN-based methods extract skeleton features by exploiting single-scale joint information, while neglecting the valuable multi-scale contextual information. Besides, the commonly used strided convolution in temporal dimension could evenly filters out the keyframes we expect to preserve and leads to the loss of keyframe information. To address these issues, we propose a novel Selective Hypergraph Convolution Network, dubbed Selective-HCN, which stacks two key modules: Selective-scale Hypergraph Convolution (SHC) and Selective-frame Temporal Convolution (STC). The SHC module represents the human skeleton as the graph and hypergraph to fully extract multi-scale information, and selectively fuse features at various scales. Instead of traditional strided temporal convolution, the STC module can adaptively select keyframes and filter redundant frames according to the importance of the frames. Extensive experiments on two challenging skeleton action benchmarks, i.e., NTU-RGB+D and Skeleton-Kinetics, demonstrate the superiority and effectiveness of our proposed method.","PeriodicalId":179895,"journal":{"name":"Proceedings of the 2022 International Conference on Multimedia Retrieval","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Selective Hypergraph Convolutional Networks for Skeleton-based Action Recognition\",\"authors\":\"Yiran Zhu, Guangji Huang, Xing Xu, Yanli Ji, Fumin Shen\",\"doi\":\"10.1145/3512527.3531367\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In skeleton-based action recognition, Graph Convolutional Networks (GCNs) have achieved remarkable performance since the skeleton representation of human action can be naturally modeled by the graph structure. Most of the existing GCN-based methods extract skeleton features by exploiting single-scale joint information, while neglecting the valuable multi-scale contextual information. Besides, the commonly used strided convolution in temporal dimension could evenly filters out the keyframes we expect to preserve and leads to the loss of keyframe information. To address these issues, we propose a novel Selective Hypergraph Convolution Network, dubbed Selective-HCN, which stacks two key modules: Selective-scale Hypergraph Convolution (SHC) and Selective-frame Temporal Convolution (STC). The SHC module represents the human skeleton as the graph and hypergraph to fully extract multi-scale information, and selectively fuse features at various scales. Instead of traditional strided temporal convolution, the STC module can adaptively select keyframes and filter redundant frames according to the importance of the frames. Extensive experiments on two challenging skeleton action benchmarks, i.e., NTU-RGB+D and Skeleton-Kinetics, demonstrate the superiority and effectiveness of our proposed method.\",\"PeriodicalId\":179895,\"journal\":{\"name\":\"Proceedings of the 2022 International Conference on Multimedia Retrieval\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2022 International Conference on Multimedia Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3512527.3531367\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 International Conference on Multimedia Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3512527.3531367","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Selective Hypergraph Convolutional Networks for Skeleton-based Action Recognition
In skeleton-based action recognition, Graph Convolutional Networks (GCNs) have achieved remarkable performance since the skeleton representation of human action can be naturally modeled by the graph structure. Most of the existing GCN-based methods extract skeleton features by exploiting single-scale joint information, while neglecting the valuable multi-scale contextual information. Besides, the commonly used strided convolution in temporal dimension could evenly filters out the keyframes we expect to preserve and leads to the loss of keyframe information. To address these issues, we propose a novel Selective Hypergraph Convolution Network, dubbed Selective-HCN, which stacks two key modules: Selective-scale Hypergraph Convolution (SHC) and Selective-frame Temporal Convolution (STC). The SHC module represents the human skeleton as the graph and hypergraph to fully extract multi-scale information, and selectively fuse features at various scales. Instead of traditional strided temporal convolution, the STC module can adaptively select keyframes and filter redundant frames according to the importance of the frames. Extensive experiments on two challenging skeleton action benchmarks, i.e., NTU-RGB+D and Skeleton-Kinetics, demonstrate the superiority and effectiveness of our proposed method.