基于骨架的人类动作识别(HAR)的扩展多流时间关注模块

IF 9 1区 心理学 Q1 PSYCHOLOGY, EXPERIMENTAL
Faisal Mehmood , Xin Guo , Enqing Chen , Muhammad Azeem Akbar , Arif Ali Khan , Sami Ullah
{"title":"基于骨架的人类动作识别(HAR)的扩展多流时间关注模块","authors":"Faisal Mehmood ,&nbsp;Xin Guo ,&nbsp;Enqing Chen ,&nbsp;Muhammad Azeem Akbar ,&nbsp;Arif Ali Khan ,&nbsp;Sami Ullah","doi":"10.1016/j.chb.2024.108482","DOIUrl":null,"url":null,"abstract":"<div><div>Graph convolutional networks (GCNs) are an effective skeleton-based human action recognition (HAR) technique. GCNs enable the specification of CNNs to a non-Euclidean frame that is more flexible. The previous GCN-based models still have a lot of issues: (I) The graph structure is the same for all model layers and input data. GCN model's hierarchical structure and human action recognition input diversity make this a problematic approach; (II) Bone length and orientation are understudied due to their significance and variance in HAR. For this purpose, we introduce an Extended Multi-stream Temporal-attention Adaptive GCN (EMS-TAGCN). By training the network topology of the proposed model either consistently or independently according to the input data, this data-based technique makes graphs more flexible and faster to adapt to a new dataset. A spatial, temporal, and channel attention module helps the adaptive graph convolutional layer focus on joints, frames, and features. Hence, a multi-stream framework representing bones, joints, and their motion enhances recognition accuracy. Our proposed model outperforms the NTU RGBD for CS and CV by 0.6% and 1.4%, respectively, while Kinetics-skeleton Top-1 and Top-5 are 1.4% improved, UCF-101 has improved 2.34% accuracy and HMDB-51 dataset has significantly improved 1.8% accuracy. According to the results, our model has performed better than the other models. Our model consistently outperformed other models, and the results were statistically significant that demonstrating the superiority of our model for the task of HAR and its ability to provide the most reliable and accurate results.</div></div>","PeriodicalId":48471,"journal":{"name":"Computers in Human Behavior","volume":"163 ","pages":"Article 108482"},"PeriodicalIF":9.0000,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Extended multi-stream temporal-attention module for skeleton-based human action recognition (HAR)\",\"authors\":\"Faisal Mehmood ,&nbsp;Xin Guo ,&nbsp;Enqing Chen ,&nbsp;Muhammad Azeem Akbar ,&nbsp;Arif Ali Khan ,&nbsp;Sami Ullah\",\"doi\":\"10.1016/j.chb.2024.108482\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Graph convolutional networks (GCNs) are an effective skeleton-based human action recognition (HAR) technique. GCNs enable the specification of CNNs to a non-Euclidean frame that is more flexible. The previous GCN-based models still have a lot of issues: (I) The graph structure is the same for all model layers and input data. GCN model's hierarchical structure and human action recognition input diversity make this a problematic approach; (II) Bone length and orientation are understudied due to their significance and variance in HAR. For this purpose, we introduce an Extended Multi-stream Temporal-attention Adaptive GCN (EMS-TAGCN). By training the network topology of the proposed model either consistently or independently according to the input data, this data-based technique makes graphs more flexible and faster to adapt to a new dataset. A spatial, temporal, and channel attention module helps the adaptive graph convolutional layer focus on joints, frames, and features. Hence, a multi-stream framework representing bones, joints, and their motion enhances recognition accuracy. Our proposed model outperforms the NTU RGBD for CS and CV by 0.6% and 1.4%, respectively, while Kinetics-skeleton Top-1 and Top-5 are 1.4% improved, UCF-101 has improved 2.34% accuracy and HMDB-51 dataset has significantly improved 1.8% accuracy. According to the results, our model has performed better than the other models. Our model consistently outperformed other models, and the results were statistically significant that demonstrating the superiority of our model for the task of HAR and its ability to provide the most reliable and accurate results.</div></div>\",\"PeriodicalId\":48471,\"journal\":{\"name\":\"Computers in Human Behavior\",\"volume\":\"163 \",\"pages\":\"Article 108482\"},\"PeriodicalIF\":9.0000,\"publicationDate\":\"2024-10-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers in Human Behavior\",\"FirstCategoryId\":\"102\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0747563224003509\",\"RegionNum\":1,\"RegionCategory\":\"心理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PSYCHOLOGY, EXPERIMENTAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in Human Behavior","FirstCategoryId":"102","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0747563224003509","RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0

摘要

图卷积网络(GCN)是一种有效的基于骨架的人类动作识别(HAR)技术。GCNs 可以将 CNNs 规格化为非欧几里得框架,更加灵活。以往基于 GCN 的模型仍存在很多问题:(I)所有模型层和输入数据的图结构都是相同的。GCN 模型的分层结构和人类动作识别输入的多样性使得这种方法存在问题;(II) 由于骨骼长度和方向在 HAR 中的重要性和差异性,对它们的研究不足。为此,我们引入了扩展的多流时态注意自适应 GCN(EMS-TAGCN)。通过根据输入数据对所提模型的网络拓扑进行一致或独立的训练,这种基于数据的技术使图形更灵活、更快速地适应新的数据集。空间、时间和通道关注模块有助于自适应图卷积层关注关节、帧和特征。因此,代表骨骼、关节及其运动的多流框架提高了识别的准确性。我们提出的模型在 CS 和 CV 方面分别比 NTU RGBD 高出 0.6% 和 1.4%,而 Kinetics-skeleton Top-1 和 Top-5 则提高了 1.4%,UCF-101 提高了 2.34% 的准确率,HMDB-51 数据集则显著提高了 1.8% 的准确率。从结果来看,我们的模型比其他模型表现更好。我们的模型始终优于其他模型,其结果在统计学上具有显著意义,这表明我们的模型在完成 HAR 任务方面具有优越性,能够提供最可靠、最准确的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Extended multi-stream temporal-attention module for skeleton-based human action recognition (HAR)
Graph convolutional networks (GCNs) are an effective skeleton-based human action recognition (HAR) technique. GCNs enable the specification of CNNs to a non-Euclidean frame that is more flexible. The previous GCN-based models still have a lot of issues: (I) The graph structure is the same for all model layers and input data. GCN model's hierarchical structure and human action recognition input diversity make this a problematic approach; (II) Bone length and orientation are understudied due to their significance and variance in HAR. For this purpose, we introduce an Extended Multi-stream Temporal-attention Adaptive GCN (EMS-TAGCN). By training the network topology of the proposed model either consistently or independently according to the input data, this data-based technique makes graphs more flexible and faster to adapt to a new dataset. A spatial, temporal, and channel attention module helps the adaptive graph convolutional layer focus on joints, frames, and features. Hence, a multi-stream framework representing bones, joints, and their motion enhances recognition accuracy. Our proposed model outperforms the NTU RGBD for CS and CV by 0.6% and 1.4%, respectively, while Kinetics-skeleton Top-1 and Top-5 are 1.4% improved, UCF-101 has improved 2.34% accuracy and HMDB-51 dataset has significantly improved 1.8% accuracy. According to the results, our model has performed better than the other models. Our model consistently outperformed other models, and the results were statistically significant that demonstrating the superiority of our model for the task of HAR and its ability to provide the most reliable and accurate results.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
19.10
自引率
4.00%
发文量
381
审稿时长
40 days
期刊介绍: Computers in Human Behavior is a scholarly journal that explores the psychological aspects of computer use. It covers original theoretical works, research reports, literature reviews, and software and book reviews. The journal examines both the use of computers in psychology, psychiatry, and related fields, and the psychological impact of computer use on individuals, groups, and society. Articles discuss topics such as professional practice, training, research, human development, learning, cognition, personality, and social interactions. It focuses on human interactions with computers, considering the computer as a medium through which human behaviors are shaped and expressed. Professionals interested in the psychological aspects of computer use will find this journal valuable, even with limited knowledge of computers.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信