{"title":"基于多模态一致性的连续手语识别优化框架","authors":"Neena Aloysius;Geetha M;Prema Nedungadi","doi":"10.1109/OJCS.2025.3564828","DOIUrl":null,"url":null,"abstract":"This study introduces Efficient ConSignformer, a novel framework advancing Continuous Sign Language Recognition (CSLR) by optimizing the Conformer-based CSLR model, ConSignformer. Central to this advancement is the Sign Query Attention (SQA) module, a computationally efficient self-attention mechanism that enhances both performance and scalability, resulting in the Efficient Conformer. Efficient ConSignformer integrates video embeddings from dual-modal CNN pipelines that process heatmaps and RGB videos, along with temporal learning layers tailored for each modality. These embeddings are further refined using the Efficient Conformer for the fused data from two modalities. To improve recognition accuracy, we employ an innovative task-adaptive supervised pretraining strategy for Efficient Conformer on a curated dataset of continuous Indian Sign Language (ISL). This strategy enables the model to effectively capture intricate data relationships during end-to-end training. Experimental results highlight the significant contributions of the SQA module and the pretraining strategy, with our model achieving competitive performance on benchmark datasets, PHOENIX-2014 and PHOENIX-2014 T. Notably, Efficient ConSignformer excels in recognizing longer sign sequences, leveraging a computationally lightweight Conformer backbone.","PeriodicalId":13205,"journal":{"name":"IEEE Open Journal of the Computer Society","volume":"6 ","pages":"739-749"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10978102","citationCount":"0","resultStr":"{\"title\":\"Optimized Multi-Modal Conformer-Based Framework for Continuous Sign Language Recognition\",\"authors\":\"Neena Aloysius;Geetha M;Prema Nedungadi\",\"doi\":\"10.1109/OJCS.2025.3564828\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This study introduces Efficient ConSignformer, a novel framework advancing Continuous Sign Language Recognition (CSLR) by optimizing the Conformer-based CSLR model, ConSignformer. Central to this advancement is the Sign Query Attention (SQA) module, a computationally efficient self-attention mechanism that enhances both performance and scalability, resulting in the Efficient Conformer. Efficient ConSignformer integrates video embeddings from dual-modal CNN pipelines that process heatmaps and RGB videos, along with temporal learning layers tailored for each modality. These embeddings are further refined using the Efficient Conformer for the fused data from two modalities. To improve recognition accuracy, we employ an innovative task-adaptive supervised pretraining strategy for Efficient Conformer on a curated dataset of continuous Indian Sign Language (ISL). This strategy enables the model to effectively capture intricate data relationships during end-to-end training. Experimental results highlight the significant contributions of the SQA module and the pretraining strategy, with our model achieving competitive performance on benchmark datasets, PHOENIX-2014 and PHOENIX-2014 T. Notably, Efficient ConSignformer excels in recognizing longer sign sequences, leveraging a computationally lightweight Conformer backbone.\",\"PeriodicalId\":13205,\"journal\":{\"name\":\"IEEE Open Journal of the Computer Society\",\"volume\":\"6 \",\"pages\":\"739-749\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-04-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10978102\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Open Journal of the Computer Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10978102/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Open Journal of the Computer Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10978102/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Optimized Multi-Modal Conformer-Based Framework for Continuous Sign Language Recognition
This study introduces Efficient ConSignformer, a novel framework advancing Continuous Sign Language Recognition (CSLR) by optimizing the Conformer-based CSLR model, ConSignformer. Central to this advancement is the Sign Query Attention (SQA) module, a computationally efficient self-attention mechanism that enhances both performance and scalability, resulting in the Efficient Conformer. Efficient ConSignformer integrates video embeddings from dual-modal CNN pipelines that process heatmaps and RGB videos, along with temporal learning layers tailored for each modality. These embeddings are further refined using the Efficient Conformer for the fused data from two modalities. To improve recognition accuracy, we employ an innovative task-adaptive supervised pretraining strategy for Efficient Conformer on a curated dataset of continuous Indian Sign Language (ISL). This strategy enables the model to effectively capture intricate data relationships during end-to-end training. Experimental results highlight the significant contributions of the SQA module and the pretraining strategy, with our model achieving competitive performance on benchmark datasets, PHOENIX-2014 and PHOENIX-2014 T. Notably, Efficient ConSignformer excels in recognizing longer sign sequences, leveraging a computationally lightweight Conformer backbone.