Enhancing speech separation performance utilizing various wavelet coefficients.

IF 2.1 2区 物理与天体物理 Q2 ACOUSTICS
Rawad Melhem, Oumayma Al Dakkak, Assef Jafar
{"title":"Enhancing speech separation performance utilizing various wavelet coefficients.","authors":"Rawad Melhem, Oumayma Al Dakkak, Assef Jafar","doi":"10.1121/10.0037082","DOIUrl":null,"url":null,"abstract":"<p><p>This study explores the efficacy of wavelet coefficients in improving speech separation models for real-world scenarios, in which performance often degrades compared to ideal conditions. Feature distortion in practical environments hampers speaker discrimination, driving the quest for more robust features beyond traditional inputs. Whereas wavelet transform (WT) is typically employed in classification tasks, this research uncovers its potential in speech separation. By integrating discrete wavelet and wavelet packets during model training, the study evaluates the impact of WT on enhancing speech separation applications. Additionally, it addresses the challenge of incorporating wavelet scattering (WS), known for lacking an exact inverse transform, into speech separation tasks. To overcome this limitation, wavelet scattering coefficients are integrated into the loss function, expanding its utility. Results demonstrate the superior performance and resilience of wavelet-based models in noisy conditions. Particularly, integrating WS coefficients enhances separation accuracy, surpassing other methods in key metrics, such as scale invariant-signal to distortion ratio, mean opinion score, and short time objective intelligibility, establishing wavelet coefficients as state-of-the-art solutions for speech separation in challenging acoustic environments.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"158 1","pages":"201-209"},"PeriodicalIF":2.1000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Acoustical Society of America","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1121/10.0037082","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0

Abstract

This study explores the efficacy of wavelet coefficients in improving speech separation models for real-world scenarios, in which performance often degrades compared to ideal conditions. Feature distortion in practical environments hampers speaker discrimination, driving the quest for more robust features beyond traditional inputs. Whereas wavelet transform (WT) is typically employed in classification tasks, this research uncovers its potential in speech separation. By integrating discrete wavelet and wavelet packets during model training, the study evaluates the impact of WT on enhancing speech separation applications. Additionally, it addresses the challenge of incorporating wavelet scattering (WS), known for lacking an exact inverse transform, into speech separation tasks. To overcome this limitation, wavelet scattering coefficients are integrated into the loss function, expanding its utility. Results demonstrate the superior performance and resilience of wavelet-based models in noisy conditions. Particularly, integrating WS coefficients enhances separation accuracy, surpassing other methods in key metrics, such as scale invariant-signal to distortion ratio, mean opinion score, and short time objective intelligibility, establishing wavelet coefficients as state-of-the-art solutions for speech separation in challenging acoustic environments.

利用不同的小波系数增强语音分离性能。
本研究探讨了小波系数在改善现实世界中语音分离模型方面的功效,在现实世界中,与理想条件相比,语音分离模型的性能通常会下降。在实际环境中,特征失真阻碍了说话者的识别,推动了对传统输入之外更强大特征的追求。小波变换通常用于分类任务,本研究揭示了小波变换在语音分离中的潜力。通过在模型训练过程中对离散小波和小波包进行整合,评估小波变换对增强语音分离应用的影响。此外,它解决了将小波散射(WS)纳入语音分离任务的挑战,众所周知,小波散射缺乏精确的逆变换。为了克服这一限制,将小波散射系数集成到损失函数中,扩大了它的实用性。结果表明,基于小波的模型在噪声条件下具有优异的性能和弹性。特别是,集成小波系数提高了分离精度,在尺度不变性、信失真比、平均意见评分和短时客观可理解度等关键指标上优于其他方法,使小波系数成为复杂声学环境下语音分离的最先进解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
4.60
自引率
16.70%
发文量
1433
审稿时长
4.7 months
期刊介绍: Since 1929 The Journal of the Acoustical Society of America has been the leading source of theoretical and experimental research results in the broad interdisciplinary study of sound. Subject coverage includes: linear and nonlinear acoustics; aeroacoustics, underwater sound and acoustical oceanography; ultrasonics and quantum acoustics; architectural and structural acoustics and vibration; speech, music and noise; psychology and physiology of hearing; engineering acoustics, transduction; bioacoustics, animal bioacoustics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信