DiffSLT: Enhancing diversity in sign language translation via diffusion model

IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
JiHwan Moon , Jihoon Park , Jungeun Kim , Jongseong Bae , Hyeongwoo Jeon , Ha Young Kim
{"title":"DiffSLT: Enhancing diversity in sign language translation via diffusion model","authors":"JiHwan Moon ,&nbsp;Jihoon Park ,&nbsp;Jungeun Kim ,&nbsp;Jongseong Bae ,&nbsp;Hyeongwoo Jeon ,&nbsp;Ha Young Kim","doi":"10.1016/j.patrec.2025.06.008","DOIUrl":null,"url":null,"abstract":"<div><div>Sign language translation (SLT) is challenging, as it involves converting sign language videos into natural language across the modalities. Previous studies have prioritized accuracy over diversity. However, diversity is crucial for handling lexical and syntactic ambiguities in machine translation, suggesting it could similarly benefit SLT. In this work, we propose DiffSLT, a gloss-free SLT framework that leverages the diffusion model, enabling diverse translations while preserving sign language semantics. DiffSLT transforms random noise into the target latent representation, conditioned on the visual features of input video. To enhance visual conditioning, we design Guidance Fusion Module, which integrates the multi-level spatiotemporal information of visual features. We also introduce DiffSLT-P, a DiffSLT variant that conditions on pseudo-glosses and visual features, providing key textual guidance and reducing the modality gap. As a result, DiffSLT and DiffSLT-P significantly improve diversity over prior gloss-free SLT methods and achieve state-of-the-art performance on the SLT datasets, markedly improving translation quality.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"196 ","pages":"Pages 117-125"},"PeriodicalIF":3.3000,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition Letters","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167865525002363","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Sign language translation (SLT) is challenging, as it involves converting sign language videos into natural language across the modalities. Previous studies have prioritized accuracy over diversity. However, diversity is crucial for handling lexical and syntactic ambiguities in machine translation, suggesting it could similarly benefit SLT. In this work, we propose DiffSLT, a gloss-free SLT framework that leverages the diffusion model, enabling diverse translations while preserving sign language semantics. DiffSLT transforms random noise into the target latent representation, conditioned on the visual features of input video. To enhance visual conditioning, we design Guidance Fusion Module, which integrates the multi-level spatiotemporal information of visual features. We also introduce DiffSLT-P, a DiffSLT variant that conditions on pseudo-glosses and visual features, providing key textual guidance and reducing the modality gap. As a result, DiffSLT and DiffSLT-P significantly improve diversity over prior gloss-free SLT methods and achieve state-of-the-art performance on the SLT datasets, markedly improving translation quality.
通过扩散模型增强手语翻译的多样性
手语翻译(SLT)是具有挑战性的,因为它涉及到将手语视频转换成各种形式的自然语言。之前的研究将准确性置于多样性之上。然而,在机器翻译中,多样性对于处理词汇和句法歧义是至关重要的,这表明它同样可以使语言翻译受益。在这项工作中,我们提出了DiffSLT,这是一个利用扩散模型的无光泽SLT框架,在保留手语语义的同时实现多种翻译。DiffSLT以输入视频的视觉特征为条件,将随机噪声转换为目标潜表示。为了增强视觉调节,我们设计了融合视觉特征多层次时空信息的导引融合模块。我们还介绍了DiffSLT- p,这是DiffSLT的一个变体,它可以对伪光泽和视觉特征进行条件处理,提供关键的文本指导并减少模态差距。因此,DiffSLT和DiffSLT- p比之前的无光泽SLT方法显著提高了多样性,并在SLT数据集上实现了最先进的性能,显著提高了翻译质量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Pattern Recognition Letters
Pattern Recognition Letters 工程技术-计算机:人工智能
CiteScore
12.40
自引率
5.90%
发文量
287
审稿时长
9.1 months
期刊介绍: Pattern Recognition Letters aims at rapid publication of concise articles of a broad interest in pattern recognition. Subject areas include all the current fields of interest represented by the Technical Committees of the International Association of Pattern Recognition, and other developing themes involving learning and recognition.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信