Fengyi Fang , Zihan Liao , Zhehan Kan , Guijin Wang , Wenming Yang
{"title":"MDSI: Pluggable Multi-strategy Decoupling with Semantic Integration for RGB-D Gesture Recognition","authors":"Fengyi Fang , Zihan Liao , Zhehan Kan , Guijin Wang , Wenming Yang","doi":"10.1016/j.patcog.2025.111653","DOIUrl":null,"url":null,"abstract":"<div><div>Gestures encompass intricate visual representations, containing both task-relevant cues such as hand shapes and task-irrelevant elements like backgrounds and performer appearances. Despite progress in RGB-D-based gesture recognition, two primary challenges persist: (i) <em>Information Redundancy</em> (IR), which hinders the task-relevant feature extraction in the entangled space and misleads the recognition; (ii) <em>Information Absence</em> (IA), which exacerbates the difficulty of identifying visually similar instances. To alleviate these drawbacks, we propose a pluggable Multi-strategy Decoupling with Semantic Integration methodology, termed MDSI, for RGB-D gesture recognition. For IR, we introduce a Multi-strategy Decoupling Network (MDN) to precisely segregate pose-motion and spatial-temporal-channel features across modalities, thus effectively mitigating redundant information. For IA, we introduce the Semantic Integration Network (SIN), which integrates natural language modeling through semantic filtering and semantic label smoothing, markedly enhancing the model’s semantic understanding and knowledge integration. MDSI’s pluggable architecture allows for seamless integration into various RGB-D-based gesture recognition methods with minimal computational overhead. Experiments conducted on two public datasets demonstrate that our approach provides better feature representation and achieves better performance than state-of-the-art methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111653"},"PeriodicalIF":7.5000,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325003139","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Gestures encompass intricate visual representations, containing both task-relevant cues such as hand shapes and task-irrelevant elements like backgrounds and performer appearances. Despite progress in RGB-D-based gesture recognition, two primary challenges persist: (i) Information Redundancy (IR), which hinders the task-relevant feature extraction in the entangled space and misleads the recognition; (ii) Information Absence (IA), which exacerbates the difficulty of identifying visually similar instances. To alleviate these drawbacks, we propose a pluggable Multi-strategy Decoupling with Semantic Integration methodology, termed MDSI, for RGB-D gesture recognition. For IR, we introduce a Multi-strategy Decoupling Network (MDN) to precisely segregate pose-motion and spatial-temporal-channel features across modalities, thus effectively mitigating redundant information. For IA, we introduce the Semantic Integration Network (SIN), which integrates natural language modeling through semantic filtering and semantic label smoothing, markedly enhancing the model’s semantic understanding and knowledge integration. MDSI’s pluggable architecture allows for seamless integration into various RGB-D-based gesture recognition methods with minimal computational overhead. Experiments conducted on two public datasets demonstrate that our approach provides better feature representation and achieves better performance than state-of-the-art methods.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.