DiffSLT: Enhancing diversity in sign language translation via diffusion model

IF 3.3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters Pub Date : 2025-06-09 DOI:10.1016/j.patrec.2025.06.008

JiHwan Moon , Jihoon Park , Jungeun Kim , Jongseong Bae , Hyeongwoo Jeon , Ha Young Kim

{"title":"DiffSLT: Enhancing diversity in sign language translation via diffusion model","authors":"JiHwan Moon , Jihoon Park , Jungeun Kim , Jongseong Bae , Hyeongwoo Jeon , Ha Young Kim","doi":"10.1016/j.patrec.2025.06.008","DOIUrl":null,"url":null,"abstract":"<div><div>Sign language translation (SLT) is challenging, as it involves converting sign language videos into natural language across the modalities. Previous studies have prioritized accuracy over diversity. However, diversity is crucial for handling lexical and syntactic ambiguities in machine translation, suggesting it could similarly benefit SLT. In this work, we propose DiffSLT, a gloss-free SLT framework that leverages the diffusion model, enabling diverse translations while preserving sign language semantics. DiffSLT transforms random noise into the target latent representation, conditioned on the visual features of input video. To enhance visual conditioning, we design Guidance Fusion Module, which integrates the multi-level spatiotemporal information of visual features. We also introduce DiffSLT-P, a DiffSLT variant that conditions on pseudo-glosses and visual features, providing key textual guidance and reducing the modality gap. As a result, DiffSLT and DiffSLT-P significantly improve diversity over prior gloss-free SLT methods and achieve state-of-the-art performance on the SLT datasets, markedly improving translation quality.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"196 ","pages":"Pages 117-125"},"PeriodicalIF":3.3000,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition Letters","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167865525002363","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Sign language translation (SLT) is challenging, as it involves converting sign language videos into natural language across the modalities. Previous studies have prioritized accuracy over diversity. However, diversity is crucial for handling lexical and syntactic ambiguities in machine translation, suggesting it could similarly benefit SLT. In this work, we propose DiffSLT, a gloss-free SLT framework that leverages the diffusion model, enabling diverse translations while preserving sign language semantics. DiffSLT transforms random noise into the target latent representation, conditioned on the visual features of input video. To enhance visual conditioning, we design Guidance Fusion Module, which integrates the multi-level spatiotemporal information of visual features. We also introduce DiffSLT-P, a DiffSLT variant that conditions on pseudo-glosses and visual features, providing key textual guidance and reducing the modality gap. As a result, DiffSLT and DiffSLT-P significantly improve diversity over prior gloss-free SLT methods and achieve state-of-the-art performance on the SLT datasets, markedly improving translation quality.

查看原文本刊更多论文

通过扩散模型增强手语翻译的多样性

手语翻译（SLT）是具有挑战性的，因为它涉及到将手语视频转换成各种形式的自然语言。之前的研究将准确性置于多样性之上。然而，在机器翻译中，多样性对于处理词汇和句法歧义是至关重要的，这表明它同样可以使语言翻译受益。在这项工作中，我们提出了DiffSLT，这是一个利用扩散模型的无光泽SLT框架，在保留手语语义的同时实现多种翻译。DiffSLT以输入视频的视觉特征为条件，将随机噪声转换为目标潜表示。为了增强视觉调节，我们设计了融合视觉特征多层次时空信息的导引融合模块。我们还介绍了DiffSLT- p，这是DiffSLT的一个变体，它可以对伪光泽和视觉特征进行条件处理，提供关键的文本指导并减少模态差距。因此，DiffSLT和DiffSLT- p比之前的无光泽SLT方法显著提高了多样性，并在SLT数据集上实现了最先进的性能，显著提高了翻译质量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Pattern Recognition Letters 工程技术-计算机：人工智能

CiteScore

12.40

自引率

5.90%

发文量

287

审稿时长

9.1 months

期刊介绍： Pattern Recognition Letters aims at rapid publication of concise articles of a broad interest in pattern recognition. Subject areas include all the current fields of interest represented by the Technical Committees of the International Association of Pattern Recognition, and other developing themes involving learning and recognition.