Temporal channel reconfiguration multi-graph convolution network for skeleton-based action recognition

IF 1.5 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IET Computer Vision Pub Date : 2024-04-17 DOI:10.1049/cvi2.12279

Siyue Lei, Bin Tang, Yanhua Chen, Mingfu Zhao, Yifei Xu, Zourong Long

{"title":"Temporal channel reconfiguration multi-graph convolution network for skeleton-based action recognition","authors":"Siyue Lei, Bin Tang, Yanhua Chen, Mingfu Zhao, Yifei Xu, Zourong Long","doi":"10.1049/cvi2.12279","DOIUrl":null,"url":null,"abstract":"<p>Skeleton-based action recognition has received much attention and achieved remarkable achievements in the field of human action recognition. In time series action prediction for different scales, existing methods mainly focus on attention mechanisms to enhance modelling capabilities in spatial dimensions. However, this approach strongly depends on the local information of a single input feature and fails to facilitate the flow of information between channels. To address these issues, the authors propose a novel Temporal Channel Reconfiguration Multi-Graph Convolution Network (TRMGCN). In the temporal convolution part, the authors designed a module called Temporal Channel Fusion with Guidance (TCFG) to capture important temporal information within channels at different scales and avoid ignoring cross-spatio-temporal dependencies among joints. In the graph convolution part, the authors propose Top-Down Attention Multi-graph Independent Convolution (TD-MIG), which uses multi-graph independent convolution to learn the topological graph feature for different length time series. Top-down attention is introduced for spatial and channel modulation to facilitate information flow in channels that do not establish topological relationships. Experimental results on the large-scale datasets NTU-RGB + D60 and 120, as well as UAV-Human, demonstrate that TRMGCN exhibits advanced performance and capabilities. Furthermore, experiments on the smaller dataset NW-UCLA have indicated that the authors’ model possesses strong generalisation abilities.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 6","pages":"813-825"},"PeriodicalIF":1.5000,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12279","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Computer Vision","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/cvi2.12279","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Skeleton-based action recognition has received much attention and achieved remarkable achievements in the field of human action recognition. In time series action prediction for different scales, existing methods mainly focus on attention mechanisms to enhance modelling capabilities in spatial dimensions. However, this approach strongly depends on the local information of a single input feature and fails to facilitate the flow of information between channels. To address these issues, the authors propose a novel Temporal Channel Reconfiguration Multi-Graph Convolution Network (TRMGCN). In the temporal convolution part, the authors designed a module called Temporal Channel Fusion with Guidance (TCFG) to capture important temporal information within channels at different scales and avoid ignoring cross-spatio-temporal dependencies among joints. In the graph convolution part, the authors propose Top-Down Attention Multi-graph Independent Convolution (TD-MIG), which uses multi-graph independent convolution to learn the topological graph feature for different length time series. Top-down attention is introduced for spatial and channel modulation to facilitate information flow in channels that do not establish topological relationships. Experimental results on the large-scale datasets NTU-RGB + D60 and 120, as well as UAV-Human, demonstrate that TRMGCN exhibits advanced performance and capabilities. Furthermore, experiments on the smaller dataset NW-UCLA have indicated that the authors’ model possesses strong generalisation abilities.

Abstract Image

查看原文本刊更多论文

基于骨架的动作识别时态信道重构多图卷积网络

基于骨架的动作识别在人类动作识别领域受到广泛关注，并取得了显著成就。在不同尺度的时间序列动作预测中，现有方法主要关注注意力机制，以增强空间维度的建模能力。然而，这种方法严重依赖于单一输入特征的局部信息，无法促进通道间的信息流动。为了解决这些问题，作者提出了一种新颖的时空通道重构多图卷积网络（TRMGCN）。在时空卷积部分，作者设计了一个名为 "带引导的时空信道融合（TCFG）"的模块，以捕捉不同尺度信道内的重要时空信息，避免忽略关节点之间的跨时空依赖关系。在图卷积部分，作者提出了自上而下注意力多图独立卷积（TD-MIG），它使用多图独立卷积来学习不同长度时间序列的拓扑图特征。在空间和信道调制中引入了自上而下注意，以促进不建立拓扑关系的信道中的信息流动。在大型数据集 NTU-RGB + D60 和 120 以及 UAV-Human 上的实验结果表明，TRMGCN 具有先进的性能和能力。此外，在较小数据集 NW-UCLA 上的实验结果表明，作者的模型具有很强的泛化能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IET Computer Vision 工程技术-工程：电子与电气

CiteScore

3.30

自引率

11.80%

发文量

审稿时长

3.4 months

期刊介绍： IET Computer Vision seeks original research papers in a wide range of areas of computer vision. The vision of the journal is to publish the highest quality research work that is relevant and topical to the field, but not forgetting those works that aim to introduce new horizons and set the agenda for future avenues of research in computer vision. IET Computer Vision welcomes submissions on the following topics: Biologically and perceptually motivated approaches to low level vision (feature detection, etc.); Perceptual grouping and organisation Representation, analysis and matching of 2D and 3D shape Shape-from-X Object recognition Image understanding Learning with visual inputs Motion analysis and object tracking Multiview scene analysis Cognitive approaches in low, mid and high level vision Control in visual systems Colour, reflectance and light Statistical and probabilistic models Face and gesture Surveillance Biometrics and security Robotics Vehicle guidance Automatic model aquisition Medical image analysis and understanding Aerial scene analysis and remote sensing Deep learning models in computer vision Both methodological and applications orientated papers are welcome. Manuscripts submitted are expected to include a detailed and analytical review of the literature and state-of-the-art exposition of the original proposed research and its methodology, its thorough experimental evaluation, and last but not least, comparative evaluation against relevant and state-of-the-art methods. Submissions not abiding by these minimum requirements may be returned to authors without being sent to review. Special Issues Current Call for Papers: Computer Vision for Smart Cameras and Camera Networks - https://digital-library.theiet.org/files/IET_CVI_SC.pdf Computer Vision for the Creative Industries - https://digital-library.theiet.org/files/IET_CVI_CVCI.pdf