Advancing Continuous Sign Language Recognition Through Denoising Diffusion Transformer-Based Spatial-Temporal Enhancement

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Concurrency and Computation-Practice & Experience Pub Date : 2025-02-13 DOI:10.1002/cpe.8385

Suhail Muhammad Kamal, Yidong Chen, Shaozi Li

{"title":"Advancing Continuous Sign Language Recognition Through Denoising Diffusion Transformer-Based Spatial-Temporal Enhancement","authors":"Suhail Muhammad Kamal, Yidong Chen, Shaozi Li","doi":"10.1002/cpe.8385","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>The intricate spatial-temporal dynamics and variability of sign language gestures pose significant challenges for Continuous Sign Language Recognition (CSLR) systems. Existing models often fall short in accurately capturing these complexities, leading to performance issues and frequent misalignments. To address these shortcomings, we introduce a new approach that leverages Denoising Diffusion Models (DDMs) to improve feature representation in the visual-sequential module of CSLR systems. Originally intended for generative tasks, DDMs have shown strong potential in representation learning through a denoising process akin to Denoising Autoencoders. Our method incorporates a denoising diffusion transformer into the CSLR framework to refine spatial-temporal features, capitalizing on the ability of diffusion models to enhance representation quality. By conditionally denoising visual feature sequences, our approach increases the discriminative capability of the system. Additionally, we introduce an additional classifier, trained with Connectionist Temporal Classification (CTC) loss, to provide complementary supervision and further boost performance. Extensive experiments demonstrate that our method significantly improves CSLR accuracy by effectively capturing the subtle details of continuous sign language gestures and overcoming the representation limitations of current models.</p>\n </div>","PeriodicalId":55214,"journal":{"name":"Concurrency and Computation-Practice & Experience","volume":"37 4-5","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concurrency and Computation-Practice & Experience","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cpe.8385","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

The intricate spatial-temporal dynamics and variability of sign language gestures pose significant challenges for Continuous Sign Language Recognition (CSLR) systems. Existing models often fall short in accurately capturing these complexities, leading to performance issues and frequent misalignments. To address these shortcomings, we introduce a new approach that leverages Denoising Diffusion Models (DDMs) to improve feature representation in the visual-sequential module of CSLR systems. Originally intended for generative tasks, DDMs have shown strong potential in representation learning through a denoising process akin to Denoising Autoencoders. Our method incorporates a denoising diffusion transformer into the CSLR framework to refine spatial-temporal features, capitalizing on the ability of diffusion models to enhance representation quality. By conditionally denoising visual feature sequences, our approach increases the discriminative capability of the system. Additionally, we introduce an additional classifier, trained with Connectionist Temporal Classification (CTC) loss, to provide complementary supervision and further boost performance. Extensive experiments demonstrate that our method significantly improves CSLR accuracy by effectively capturing the subtle details of continuous sign language gestures and overcoming the representation limitations of current models.

查看原文本刊更多论文

求助全文

约1分钟内获得全文求助全文

来源期刊

Concurrency and Computation-Practice & Experience 工程技术-计算机：理论方法

CiteScore

5.00

自引率

10.00%

发文量

664

审稿时长

9.6 months

期刊介绍： Concurrency and Computation: Practice and Experience (CCPE) publishes high-quality, original research papers, and authoritative research review papers, in the overlapping fields of: Parallel and distributed computing; High-performance computing; Computational and data science; Artificial intelligence and machine learning; Big data applications, algorithms, and systems; Network science; Ontologies and semantics; Security and privacy; Cloud/edge/fog computing; Green computing; and Quantum computing.