MDSI: Pluggable Multi-strategy Decoupling with Semantic Integration for RGB-D Gesture Recognition

IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Fengyi Fang , Zihan Liao , Zhehan Kan , Guijin Wang , Wenming Yang
{"title":"MDSI: Pluggable Multi-strategy Decoupling with Semantic Integration for RGB-D Gesture Recognition","authors":"Fengyi Fang ,&nbsp;Zihan Liao ,&nbsp;Zhehan Kan ,&nbsp;Guijin Wang ,&nbsp;Wenming Yang","doi":"10.1016/j.patcog.2025.111653","DOIUrl":null,"url":null,"abstract":"<div><div>Gestures encompass intricate visual representations, containing both task-relevant cues such as hand shapes and task-irrelevant elements like backgrounds and performer appearances. Despite progress in RGB-D-based gesture recognition, two primary challenges persist: (i) <em>Information Redundancy</em> (IR), which hinders the task-relevant feature extraction in the entangled space and misleads the recognition; (ii) <em>Information Absence</em> (IA), which exacerbates the difficulty of identifying visually similar instances. To alleviate these drawbacks, we propose a pluggable Multi-strategy Decoupling with Semantic Integration methodology, termed MDSI, for RGB-D gesture recognition. For IR, we introduce a Multi-strategy Decoupling Network (MDN) to precisely segregate pose-motion and spatial-temporal-channel features across modalities, thus effectively mitigating redundant information. For IA, we introduce the Semantic Integration Network (SIN), which integrates natural language modeling through semantic filtering and semantic label smoothing, markedly enhancing the model’s semantic understanding and knowledge integration. MDSI’s pluggable architecture allows for seamless integration into various RGB-D-based gesture recognition methods with minimal computational overhead. Experiments conducted on two public datasets demonstrate that our approach provides better feature representation and achieves better performance than state-of-the-art methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111653"},"PeriodicalIF":7.5000,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325003139","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Gestures encompass intricate visual representations, containing both task-relevant cues such as hand shapes and task-irrelevant elements like backgrounds and performer appearances. Despite progress in RGB-D-based gesture recognition, two primary challenges persist: (i) Information Redundancy (IR), which hinders the task-relevant feature extraction in the entangled space and misleads the recognition; (ii) Information Absence (IA), which exacerbates the difficulty of identifying visually similar instances. To alleviate these drawbacks, we propose a pluggable Multi-strategy Decoupling with Semantic Integration methodology, termed MDSI, for RGB-D gesture recognition. For IR, we introduce a Multi-strategy Decoupling Network (MDN) to precisely segregate pose-motion and spatial-temporal-channel features across modalities, thus effectively mitigating redundant information. For IA, we introduce the Semantic Integration Network (SIN), which integrates natural language modeling through semantic filtering and semantic label smoothing, markedly enhancing the model’s semantic understanding and knowledge integration. MDSI’s pluggable architecture allows for seamless integration into various RGB-D-based gesture recognition methods with minimal computational overhead. Experiments conducted on two public datasets demonstrate that our approach provides better feature representation and achieves better performance than state-of-the-art methods.
基于语义集成的可插拔多策略解耦RGB-D手势识别
手势包含复杂的视觉表征,既包含与任务相关的线索,如手的形状,也包含与任务无关的元素,如背景和表演者的外表。尽管基于rgb - d的手势识别取得了进展,但仍然存在两个主要挑战:(i)信息冗余(IR),它阻碍了在纠缠空间中提取与任务相关的特征并误导识别;信息缺失(IA),这加剧了识别视觉上相似的情况的困难。为了减轻这些缺点,我们提出了一种可插拔的多策略解耦与语义集成方法,称为MDSI,用于RGB-D手势识别。对于IR,我们引入了多策略解耦网络(MDN)来精确分离姿态-运动和时空-信道特征,从而有效地减少冗余信息。对于人工智能,我们引入了语义集成网络(SIN),该网络通过语义过滤和语义标签平滑将自然语言建模集成在一起,显著提高了模型的语义理解和知识集成能力。MDSI的可插拔架构允许以最小的计算开销无缝集成到各种基于rgb - d的手势识别方法中。在两个公共数据集上进行的实验表明,我们的方法提供了更好的特征表示,并且比最先进的方法取得了更好的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Pattern Recognition
Pattern Recognition 工程技术-工程:电子与电气
CiteScore
14.40
自引率
16.20%
发文量
683
审稿时长
5.6 months
期刊介绍: The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信