基于骨架动作识别的高阶特征增强时空图卷积网络。

IF 2.6 3区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

PLoS ONE Pub Date : 2025-10-09 eCollection Date: 2025-01-01 DOI:10.1371/journal.pone.0332815

Mohammed H Al-Hakimi, Ibrar Ahmed, Muhammad Haseeb, Taha H Rassem, Fahmi H Quradaa, Rashad S Almoqbily

{"title":"基于骨架动作识别的高阶特征增强时空图卷积网络。","authors":"Mohammed H Al-Hakimi, Ibrar Ahmed, Muhammad Haseeb, Taha H Rassem, Fahmi H Quradaa, Rashad S Almoqbily","doi":"10.1371/journal.pone.0332815","DOIUrl":null,"url":null,"abstract":"Skeleton-based action recognition has emerged as a promising field within computer vision, offering structured representations of human motion. While existing Graph Convolutional Network (GCN)-based approaches primarily rely on raw 3D joint coordinates, these representations fail to capture higher-order spatial and temporal dependencies critical for distinguishing fine-grained actions. In this study, we introduce novel geometric features for joints, bones, and motion streams, including multi-level spatial normalization, higher-order temporal derivatives, and bone-structure encoding through lengths, angles, and anatomical distances. These enriched features explicitly model kinematic and structural relationships, enabling the capture of subtle motion dynamics and discriminative patterns. Building on this, we propose two architectures: (i) an Enhanced Multi-Stream AGCN (EMS-AGCN) that integrates joint, bone, and motion features via a weighted fusion at the final layer, and (ii) a Multi-Branch AGCN (MB-AGCN) where features are processed in independent branches and fused adaptively at an early layer. Comprehensive experiments on the NTU-RGB+D 60 benchmark demonstrate the effectiveness of our approach: EMS-AGCN achieves 96.2% accuracy and MB-AGCN attains 95.5%, both surpassing state-of-the-art methods. These findings confirm that incorporating higher-order geometric features alongside adaptive fusion mechanisms substantially improves skeleton-based action recognition.","PeriodicalId":20189,"journal":{"name":"PLoS ONE","volume":"20 10","pages":"e0332815"},"PeriodicalIF":2.6000,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12510661/pdf/","citationCount":"0","resultStr":"{\"title\":\"An enhanced spatial-temporal graph convolution network with high order features for skeleton-based action recognition.\",\"authors\":\"Mohammed H Al-Hakimi, Ibrar Ahmed, Muhammad Haseeb, Taha H Rassem, Fahmi H Quradaa, Rashad S Almoqbily\",\"doi\":\"10.1371/journal.pone.0332815\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Skeleton-based action recognition has emerged as a promising field within computer vision, offering structured representations of human motion. While existing Graph Convolutional Network (GCN)-based approaches primarily rely on raw 3D joint coordinates, these representations fail to capture higher-order spatial and temporal dependencies critical for distinguishing fine-grained actions. In this study, we introduce novel geometric features for joints, bones, and motion streams, including multi-level spatial normalization, higher-order temporal derivatives, and bone-structure encoding through lengths, angles, and anatomical distances. These enriched features explicitly model kinematic and structural relationships, enabling the capture of subtle motion dynamics and discriminative patterns. Building on this, we propose two architectures: (i) an Enhanced Multi-Stream AGCN (EMS-AGCN) that integrates joint, bone, and motion features via a weighted fusion at the final layer, and (ii) a Multi-Branch AGCN (MB-AGCN) where features are processed in independent branches and fused adaptively at an early layer. Comprehensive experiments on the NTU-RGB+D 60 benchmark demonstrate the effectiveness of our approach: EMS-AGCN achieves 96.2% accuracy and MB-AGCN attains 95.5%, both surpassing state-of-the-art methods. These findings confirm that incorporating higher-order geometric features alongside adaptive fusion mechanisms substantially improves skeleton-based action recognition.\",\"PeriodicalId\":20189,\"journal\":{\"name\":\"PLoS ONE\",\"volume\":\"20 10\",\"pages\":\"e0332815\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2025-10-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12510661/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PLoS ONE\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1371/journal.pone.0332815\",\"RegionNum\":3,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS ONE","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1371/journal.pone.0332815","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

摘要

基于骨骼的动作识别已经成为计算机视觉中一个很有前途的领域，它提供了人类运动的结构化表示。虽然现有的基于图形卷积网络（GCN）的方法主要依赖于原始的3D关节坐标，但这些表示无法捕获高阶空间和时间依赖性，这对于区分细粒度动作至关重要。在这项研究中，我们为关节、骨骼和运动流引入了新的几何特征，包括多层次空间归一化、高阶时间导数和通过长度、角度和解剖距离进行的骨结构编码。这些丰富的特征明确地模拟了运动学和结构关系，从而能够捕捉细微的运动动力学和判别模式。在此基础上，我们提出了两种架构：(i)增强型多流AGCN (EMS-AGCN)，通过最后一层的加权融合集成关节、骨骼和运动特征；（ii）多分支AGCN (MB-AGCN)，其中特征在独立分支中进行处理，并在早期层自适应融合。在NTU-RGB+D 60基准上的综合实验证明了我们的方法的有效性：EMS-AGCN的准确率达到96.2%，MB-AGCN的准确率达到95.5%，均超过了目前最先进的方法。这些发现证实，结合高阶几何特征和自适应融合机制，大大提高了基于骨骼的动作识别。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An enhanced spatial-temporal graph convolution network with high order features for skeleton-based action recognition.

Skeleton-based action recognition has emerged as a promising field within computer vision, offering structured representations of human motion. While existing Graph Convolutional Network (GCN)-based approaches primarily rely on raw 3D joint coordinates, these representations fail to capture higher-order spatial and temporal dependencies critical for distinguishing fine-grained actions. In this study, we introduce novel geometric features for joints, bones, and motion streams, including multi-level spatial normalization, higher-order temporal derivatives, and bone-structure encoding through lengths, angles, and anatomical distances. These enriched features explicitly model kinematic and structural relationships, enabling the capture of subtle motion dynamics and discriminative patterns. Building on this, we propose two architectures: (i) an Enhanced Multi-Stream AGCN (EMS-AGCN) that integrates joint, bone, and motion features via a weighted fusion at the final layer, and (ii) a Multi-Branch AGCN (MB-AGCN) where features are processed in independent branches and fused adaptively at an early layer. Comprehensive experiments on the NTU-RGB+D 60 benchmark demonstrate the effectiveness of our approach: EMS-AGCN achieves 96.2% accuracy and MB-AGCN attains 95.5%, both surpassing state-of-the-art methods. These findings confirm that incorporating higher-order geometric features alongside adaptive fusion mechanisms substantially improves skeleton-based action recognition.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

PLoS ONE 生物-生物学

CiteScore

6.20

自引率

5.40%

发文量

14242

审稿时长

3.7 months

期刊介绍： PLOS ONE is an international, peer-reviewed, open-access, online publication. PLOS ONE welcomes reports on primary research from any scientific discipline. It provides: * Open-access—freely accessible online, authors retain copyright * Fast publication times * Peer review by expert, practicing researchers * Post-publication tools to indicate quality and impact * Community-based dialogue on articles * Worldwide media coverage