{"title":"Advancing Continuous Sign Language Recognition Through Denoising Diffusion Transformer-Based Spatial-Temporal Enhancement","authors":"Suhail Muhammad Kamal, Yidong Chen, Shaozi Li","doi":"10.1002/cpe.8385","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>The intricate spatial-temporal dynamics and variability of sign language gestures pose significant challenges for Continuous Sign Language Recognition (CSLR) systems. Existing models often fall short in accurately capturing these complexities, leading to performance issues and frequent misalignments. To address these shortcomings, we introduce a new approach that leverages Denoising Diffusion Models (DDMs) to improve feature representation in the visual-sequential module of CSLR systems. Originally intended for generative tasks, DDMs have shown strong potential in representation learning through a denoising process akin to Denoising Autoencoders. Our method incorporates a denoising diffusion transformer into the CSLR framework to refine spatial-temporal features, capitalizing on the ability of diffusion models to enhance representation quality. By conditionally denoising visual feature sequences, our approach increases the discriminative capability of the system. Additionally, we introduce an additional classifier, trained with Connectionist Temporal Classification (CTC) loss, to provide complementary supervision and further boost performance. Extensive experiments demonstrate that our method significantly improves CSLR accuracy by effectively capturing the subtle details of continuous sign language gestures and overcoming the representation limitations of current models.</p>\n </div>","PeriodicalId":55214,"journal":{"name":"Concurrency and Computation-Practice & Experience","volume":"37 4-5","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concurrency and Computation-Practice & Experience","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cpe.8385","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
The intricate spatial-temporal dynamics and variability of sign language gestures pose significant challenges for Continuous Sign Language Recognition (CSLR) systems. Existing models often fall short in accurately capturing these complexities, leading to performance issues and frequent misalignments. To address these shortcomings, we introduce a new approach that leverages Denoising Diffusion Models (DDMs) to improve feature representation in the visual-sequential module of CSLR systems. Originally intended for generative tasks, DDMs have shown strong potential in representation learning through a denoising process akin to Denoising Autoencoders. Our method incorporates a denoising diffusion transformer into the CSLR framework to refine spatial-temporal features, capitalizing on the ability of diffusion models to enhance representation quality. By conditionally denoising visual feature sequences, our approach increases the discriminative capability of the system. Additionally, we introduce an additional classifier, trained with Connectionist Temporal Classification (CTC) loss, to provide complementary supervision and further boost performance. Extensive experiments demonstrate that our method significantly improves CSLR accuracy by effectively capturing the subtle details of continuous sign language gestures and overcoming the representation limitations of current models.
期刊介绍:
Concurrency and Computation: Practice and Experience (CCPE) publishes high-quality, original research papers, and authoritative research review papers, in the overlapping fields of:
Parallel and distributed computing;
High-performance computing;
Computational and data science;
Artificial intelligence and machine learning;
Big data applications, algorithms, and systems;
Network science;
Ontologies and semantics;
Security and privacy;
Cloud/edge/fog computing;
Green computing; and
Quantum computing.