Multi-scale skeleton simplification graph convolutional network for skeleton-based action recognition

IF 1.3 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IET Computer Vision Pub Date : 2024-07-08 DOI:10.1049/cvi2.12300

Fan Zhang, Ding Chongyang, Kai Liu, Liu Hongjin

{"title":"Multi-scale skeleton simplification graph convolutional network for skeleton-based action recognition","authors":"Fan Zhang, Ding Chongyang, Kai Liu, Liu Hongjin","doi":"10.1049/cvi2.12300","DOIUrl":null,"url":null,"abstract":"<p>Human action recognition based on graph convolutional networks (GCNs) is one of the hotspots in computer vision. However, previous methods generally rely on handcrafted graph, which limits the effectiveness of the model in characterising the connections between indirectly connected joints. The limitation leads to weakened connections when joints are separated by long distances. To address the above issue, the authors propose a skeleton simplification method which aims to reduce the number of joints and the distance between joints by merging adjacent joints into simplified joints. Group convolutional block is devised to extract the internal features of the simplified joints. Additionally, the authors enhance the method by introducing multi-scale modelling, which maps inputs into sequences across various levels of simplification. Combining with spatial temporal graph convolution, a multi-scale skeleton simplification GCN for skeleton-based action recognition (M3S-GCN) is proposed for fusing multi-scale skeleton sequences and modelling the connections between joints. Finally, M3S-GCN is evaluated on five benchmarks of NTU RGB+D 60 (C-Sub, C-View), NTU RGB+D 120 (X-Sub, X-Set) and NW-UCLA datasets. Experimental results show that the authors’ M3S-GCN achieves state-of-the-art performance with the accuracies of 93.0%, 97.0% and 91.2% on C-Sub, C-View and X-Set benchmarks, which validates the effectiveness of the method.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 7","pages":"992-1003"},"PeriodicalIF":1.3000,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12300","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Computer Vision","FirstCategoryId":"94","ListUrlMain":"https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/cvi2.12300","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Human action recognition based on graph convolutional networks (GCNs) is one of the hotspots in computer vision. However, previous methods generally rely on handcrafted graph, which limits the effectiveness of the model in characterising the connections between indirectly connected joints. The limitation leads to weakened connections when joints are separated by long distances. To address the above issue, the authors propose a skeleton simplification method which aims to reduce the number of joints and the distance between joints by merging adjacent joints into simplified joints. Group convolutional block is devised to extract the internal features of the simplified joints. Additionally, the authors enhance the method by introducing multi-scale modelling, which maps inputs into sequences across various levels of simplification. Combining with spatial temporal graph convolution, a multi-scale skeleton simplification GCN for skeleton-based action recognition (M3S-GCN) is proposed for fusing multi-scale skeleton sequences and modelling the connections between joints. Finally, M3S-GCN is evaluated on five benchmarks of NTU RGB+D 60 (C-Sub, C-View), NTU RGB+D 120 (X-Sub, X-Set) and NW-UCLA datasets. Experimental results show that the authors’ M3S-GCN achieves state-of-the-art performance with the accuracies of 93.0%, 97.0% and 91.2% on C-Sub, C-View and X-Set benchmarks, which validates the effectiveness of the method.

Abstract Image

查看原文本刊更多论文

基于骨骼动作识别的多尺度骨骼简化图卷积网络

基于图卷积网络（GCN）的人类动作识别是计算机视觉领域的热点之一。然而，以往的方法通常依赖于手工制作的图，这就限制了模型在描述间接连接的关节之间的连接时的有效性。当关节之间的距离较远时，这种限制会导致连接减弱。为解决上述问题，作者提出了一种骨架简化方法，旨在通过将相邻关节合并为简化关节来减少关节数量和关节间距。组卷积块用于提取简化关节的内部特征。此外，作者还通过引入多尺度建模，将输入映射到不同简化级别的序列中，从而增强了该方法。结合空间时间图卷积，作者提出了一种用于基于骨骼的动作识别的多尺度骨骼简化 GCN（M3S-GCN），用于融合多尺度骨骼序列并对关节之间的连接进行建模。最后，M3S-GCN 在 NTU RGB+D 60（C-Sub、C-View）、NTU RGB+D 120（X-Sub、X-Set）和 NW-UCLA 数据集的五个基准上进行了评估。实验结果表明，作者的 M3S-GCN 在 C-Sub、C-View 和 X-Set 基准上的准确率分别为 93.0%、97.0% 和 91.2%，达到了最先进的水平，验证了该方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IET Computer Vision 工程技术-工程：电子与电气

CiteScore

3.30

自引率

11.80%

发文量

审稿时长

3.4 months

期刊介绍： IET Computer Vision seeks original research papers in a wide range of areas of computer vision. The vision of the journal is to publish the highest quality research work that is relevant and topical to the field, but not forgetting those works that aim to introduce new horizons and set the agenda for future avenues of research in computer vision. IET Computer Vision welcomes submissions on the following topics: Biologically and perceptually motivated approaches to low level vision (feature detection, etc.); Perceptual grouping and organisation Representation, analysis and matching of 2D and 3D shape Shape-from-X Object recognition Image understanding Learning with visual inputs Motion analysis and object tracking Multiview scene analysis Cognitive approaches in low, mid and high level vision Control in visual systems Colour, reflectance and light Statistical and probabilistic models Face and gesture Surveillance Biometrics and security Robotics Vehicle guidance Automatic model aquisition Medical image analysis and understanding Aerial scene analysis and remote sensing Deep learning models in computer vision Both methodological and applications orientated papers are welcome. Manuscripts submitted are expected to include a detailed and analytical review of the literature and state-of-the-art exposition of the original proposed research and its methodology, its thorough experimental evaluation, and last but not least, comparative evaluation against relevant and state-of-the-art methods. Submissions not abiding by these minimum requirements may be returned to authors without being sent to review. Special Issues Current Call for Papers: Computer Vision for Smart Cameras and Camera Networks - https://digital-library.theiet.org/files/IET_CVI_SC.pdf Computer Vision for the Creative Industries - https://digital-library.theiet.org/files/IET_CVI_CVCI.pdf