自适应陷阱:探索基于骨架的动作识别中自适应的有效性

IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Qiguang Miao;Wentian Xin;Ruyi Liu;Yi Liu;Mengyao Wu;Cheng Shi;Chi-Man Pun
{"title":"自适应陷阱:探索基于骨架的动作识别中自适应的有效性","authors":"Qiguang Miao;Wentian Xin;Ruyi Liu;Yi Liu;Mengyao Wu;Cheng Shi;Chi-Man Pun","doi":"10.1109/TMM.2024.3521774","DOIUrl":null,"url":null,"abstract":"Graph convolution networks (GCNs) have achieved remarkable performance in skeleton-based action recognition by exploiting the adjacency topology of body representation. However, the adaptive strategy adopted by the previous methods to construct the adjacency matrix is not balanced between the performance and the computational cost. We assume this concept of <italic>Adaptive Trap</i>, which can be replaced by multiple autonomous submodules, thereby simultaneously enhancing the dynamic joint representation and effectively reducing network resources. To effectuate the substitution of the adaptive model, we unveil two distinct strategies, both yielding comparable effects. (1) Optimization. <italic>Individuality and Commonality GCNs (IC-GCNs)</i> is proposed to specifically optimize the construction method of the associativity adjacency matrix for adaptive processing. The uniqueness and co-occurrence between different joint points and frames in the skeleton topology are effectively captured through methodologies like preferential fusion of physical information, extreme compression of multi-dimensional channels, and simplification of self-attention mechanism. (2) Replacement. <italic>Auto-Learning GCNs (AL-GCNs)</i> is proposed to boldly remove popular adaptive modules and cleverly utilize human key points as motion compensation to provide dynamic correlation support. AL-GCNs construct a fully learnable group adjacency matrix in both spatial and temporal dimensions, resulting in an elegant and efficient GCN-based model. In addition, three effective tricks for skeleton-based action recognition (Skip-Block, Bayesian Weight Selection Algorithm, and Simplified Dimensional Attention) are exposed and analyzed in this paper. Finally, we employ the variable channel and grouping method to explore the hardware resource bound of the two proposed models. IC-GCN and AL-GCN exhibit impressive performance across NTU-RGB+D 60, NTU-RGB+D 120, NW-UCLA, and UAV-Human datasets, with an exceptional parameter-cost ratio.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"56-71"},"PeriodicalIF":8.4000,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Adaptive Pitfall: Exploring the Effectiveness of Adaptation in Skeleton-Based Action Recognition\",\"authors\":\"Qiguang Miao;Wentian Xin;Ruyi Liu;Yi Liu;Mengyao Wu;Cheng Shi;Chi-Man Pun\",\"doi\":\"10.1109/TMM.2024.3521774\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Graph convolution networks (GCNs) have achieved remarkable performance in skeleton-based action recognition by exploiting the adjacency topology of body representation. However, the adaptive strategy adopted by the previous methods to construct the adjacency matrix is not balanced between the performance and the computational cost. We assume this concept of <italic>Adaptive Trap</i>, which can be replaced by multiple autonomous submodules, thereby simultaneously enhancing the dynamic joint representation and effectively reducing network resources. To effectuate the substitution of the adaptive model, we unveil two distinct strategies, both yielding comparable effects. (1) Optimization. <italic>Individuality and Commonality GCNs (IC-GCNs)</i> is proposed to specifically optimize the construction method of the associativity adjacency matrix for adaptive processing. The uniqueness and co-occurrence between different joint points and frames in the skeleton topology are effectively captured through methodologies like preferential fusion of physical information, extreme compression of multi-dimensional channels, and simplification of self-attention mechanism. (2) Replacement. <italic>Auto-Learning GCNs (AL-GCNs)</i> is proposed to boldly remove popular adaptive modules and cleverly utilize human key points as motion compensation to provide dynamic correlation support. AL-GCNs construct a fully learnable group adjacency matrix in both spatial and temporal dimensions, resulting in an elegant and efficient GCN-based model. In addition, three effective tricks for skeleton-based action recognition (Skip-Block, Bayesian Weight Selection Algorithm, and Simplified Dimensional Attention) are exposed and analyzed in this paper. Finally, we employ the variable channel and grouping method to explore the hardware resource bound of the two proposed models. IC-GCN and AL-GCN exhibit impressive performance across NTU-RGB+D 60, NTU-RGB+D 120, NW-UCLA, and UAV-Human datasets, with an exceptional parameter-cost ratio.\",\"PeriodicalId\":13273,\"journal\":{\"name\":\"IEEE Transactions on Multimedia\",\"volume\":\"27 \",\"pages\":\"56-71\"},\"PeriodicalIF\":8.4000,\"publicationDate\":\"2024-12-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Multimedia\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10814651/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10814651/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

图卷积网络(GCNs)利用身体表示的邻接拓扑在基于骨架的动作识别中取得了显著的性能。然而,之前的方法所采用的自适应策略在构造邻接矩阵时,并没有在性能和计算成本之间取得平衡。我们假设了自适应陷阱的概念,它可以被多个自治子模块取代,从而同时增强了动态联合表示,有效地减少了网络资源。为了实现自适应模型的替代,我们揭示了两种不同的策略,两者都产生了可比较的效果。(1)优化。个性共性GCNs (IC-GCNs)是针对自适应处理中结合律邻接矩阵的构造方法进行优化而提出的。通过物理信息的优先融合、多维通道的极度压缩和自关注机制的简化等方法,有效地捕获了骨架拓扑中不同连接点和框架之间的唯一性和共现性。(2)更换。自动学习GCNs (Auto-Learning GCNs, AL-GCNs)大胆去除流行的自适应模块,巧妙地利用人体关键点作为运动补偿,提供动态相关支持。al - gcn在空间和时间维度上构建了一个完全可学习的群邻接矩阵,从而得到了一个优雅高效的基于gcn的模型。此外,本文还对基于骨架的动作识别的三种有效技巧(Skip-Block、贝叶斯权重选择算法和简化维度注意算法)进行了揭示和分析。最后,我们采用可变通道和分组方法来探索两种模型的硬件资源边界。IC-GCN和AL-GCN在NTU-RGB+ d60、NTU-RGB+ d120、NW-UCLA和UAV-Human数据集上表现出令人印象深刻的性能,具有卓越的参数成本比。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Adaptive Pitfall: Exploring the Effectiveness of Adaptation in Skeleton-Based Action Recognition
Graph convolution networks (GCNs) have achieved remarkable performance in skeleton-based action recognition by exploiting the adjacency topology of body representation. However, the adaptive strategy adopted by the previous methods to construct the adjacency matrix is not balanced between the performance and the computational cost. We assume this concept of Adaptive Trap, which can be replaced by multiple autonomous submodules, thereby simultaneously enhancing the dynamic joint representation and effectively reducing network resources. To effectuate the substitution of the adaptive model, we unveil two distinct strategies, both yielding comparable effects. (1) Optimization. Individuality and Commonality GCNs (IC-GCNs) is proposed to specifically optimize the construction method of the associativity adjacency matrix for adaptive processing. The uniqueness and co-occurrence between different joint points and frames in the skeleton topology are effectively captured through methodologies like preferential fusion of physical information, extreme compression of multi-dimensional channels, and simplification of self-attention mechanism. (2) Replacement. Auto-Learning GCNs (AL-GCNs) is proposed to boldly remove popular adaptive modules and cleverly utilize human key points as motion compensation to provide dynamic correlation support. AL-GCNs construct a fully learnable group adjacency matrix in both spatial and temporal dimensions, resulting in an elegant and efficient GCN-based model. In addition, three effective tricks for skeleton-based action recognition (Skip-Block, Bayesian Weight Selection Algorithm, and Simplified Dimensional Attention) are exposed and analyzed in this paper. Finally, we employ the variable channel and grouping method to explore the hardware resource bound of the two proposed models. IC-GCN and AL-GCN exhibit impressive performance across NTU-RGB+D 60, NTU-RGB+D 120, NW-UCLA, and UAV-Human datasets, with an exceptional parameter-cost ratio.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Transactions on Multimedia
IEEE Transactions on Multimedia 工程技术-电信学
CiteScore
11.70
自引率
11.00%
发文量
576
审稿时长
5.5 months
期刊介绍: The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信