Mohammed H Al-Hakimi, Ibrar Ahmed, Muhammad Haseeb, Taha H Rassem, Fahmi H Quradaa, Rashad S Almoqbily
{"title":"基于骨架动作识别的高阶特征增强时空图卷积网络。","authors":"Mohammed H Al-Hakimi, Ibrar Ahmed, Muhammad Haseeb, Taha H Rassem, Fahmi H Quradaa, Rashad S Almoqbily","doi":"10.1371/journal.pone.0332815","DOIUrl":null,"url":null,"abstract":"<p><p>Skeleton-based action recognition has emerged as a promising field within computer vision, offering structured representations of human motion. While existing Graph Convolutional Network (GCN)-based approaches primarily rely on raw 3D joint coordinates, these representations fail to capture higher-order spatial and temporal dependencies critical for distinguishing fine-grained actions. In this study, we introduce novel geometric features for joints, bones, and motion streams, including multi-level spatial normalization, higher-order temporal derivatives, and bone-structure encoding through lengths, angles, and anatomical distances. These enriched features explicitly model kinematic and structural relationships, enabling the capture of subtle motion dynamics and discriminative patterns. Building on this, we propose two architectures: (i) an Enhanced Multi-Stream AGCN (EMS-AGCN) that integrates joint, bone, and motion features via a weighted fusion at the final layer, and (ii) a Multi-Branch AGCN (MB-AGCN) where features are processed in independent branches and fused adaptively at an early layer. Comprehensive experiments on the NTU-RGB+D 60 benchmark demonstrate the effectiveness of our approach: EMS-AGCN achieves 96.2% accuracy and MB-AGCN attains 95.5%, both surpassing state-of-the-art methods. These findings confirm that incorporating higher-order geometric features alongside adaptive fusion mechanisms substantially improves skeleton-based action recognition.</p>","PeriodicalId":20189,"journal":{"name":"PLoS ONE","volume":"20 10","pages":"e0332815"},"PeriodicalIF":2.6000,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12510661/pdf/","citationCount":"0","resultStr":"{\"title\":\"An enhanced spatial-temporal graph convolution network with high order features for skeleton-based action recognition.\",\"authors\":\"Mohammed H Al-Hakimi, Ibrar Ahmed, Muhammad Haseeb, Taha H Rassem, Fahmi H Quradaa, Rashad S Almoqbily\",\"doi\":\"10.1371/journal.pone.0332815\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Skeleton-based action recognition has emerged as a promising field within computer vision, offering structured representations of human motion. While existing Graph Convolutional Network (GCN)-based approaches primarily rely on raw 3D joint coordinates, these representations fail to capture higher-order spatial and temporal dependencies critical for distinguishing fine-grained actions. In this study, we introduce novel geometric features for joints, bones, and motion streams, including multi-level spatial normalization, higher-order temporal derivatives, and bone-structure encoding through lengths, angles, and anatomical distances. These enriched features explicitly model kinematic and structural relationships, enabling the capture of subtle motion dynamics and discriminative patterns. Building on this, we propose two architectures: (i) an Enhanced Multi-Stream AGCN (EMS-AGCN) that integrates joint, bone, and motion features via a weighted fusion at the final layer, and (ii) a Multi-Branch AGCN (MB-AGCN) where features are processed in independent branches and fused adaptively at an early layer. Comprehensive experiments on the NTU-RGB+D 60 benchmark demonstrate the effectiveness of our approach: EMS-AGCN achieves 96.2% accuracy and MB-AGCN attains 95.5%, both surpassing state-of-the-art methods. These findings confirm that incorporating higher-order geometric features alongside adaptive fusion mechanisms substantially improves skeleton-based action recognition.</p>\",\"PeriodicalId\":20189,\"journal\":{\"name\":\"PLoS ONE\",\"volume\":\"20 10\",\"pages\":\"e0332815\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2025-10-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12510661/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PLoS ONE\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1371/journal.pone.0332815\",\"RegionNum\":3,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS ONE","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1371/journal.pone.0332815","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
An enhanced spatial-temporal graph convolution network with high order features for skeleton-based action recognition.
Skeleton-based action recognition has emerged as a promising field within computer vision, offering structured representations of human motion. While existing Graph Convolutional Network (GCN)-based approaches primarily rely on raw 3D joint coordinates, these representations fail to capture higher-order spatial and temporal dependencies critical for distinguishing fine-grained actions. In this study, we introduce novel geometric features for joints, bones, and motion streams, including multi-level spatial normalization, higher-order temporal derivatives, and bone-structure encoding through lengths, angles, and anatomical distances. These enriched features explicitly model kinematic and structural relationships, enabling the capture of subtle motion dynamics and discriminative patterns. Building on this, we propose two architectures: (i) an Enhanced Multi-Stream AGCN (EMS-AGCN) that integrates joint, bone, and motion features via a weighted fusion at the final layer, and (ii) a Multi-Branch AGCN (MB-AGCN) where features are processed in independent branches and fused adaptively at an early layer. Comprehensive experiments on the NTU-RGB+D 60 benchmark demonstrate the effectiveness of our approach: EMS-AGCN achieves 96.2% accuracy and MB-AGCN attains 95.5%, both surpassing state-of-the-art methods. These findings confirm that incorporating higher-order geometric features alongside adaptive fusion mechanisms substantially improves skeleton-based action recognition.
期刊介绍:
PLOS ONE is an international, peer-reviewed, open-access, online publication. PLOS ONE welcomes reports on primary research from any scientific discipline. It provides:
* Open-access—freely accessible online, authors retain copyright
* Fast publication times
* Peer review by expert, practicing researchers
* Post-publication tools to indicate quality and impact
* Community-based dialogue on articles
* Worldwide media coverage