{"title":"Enhancing speech separation performance utilizing various wavelet coefficients.","authors":"Rawad Melhem, Oumayma Al Dakkak, Assef Jafar","doi":"10.1121/10.0037082","DOIUrl":null,"url":null,"abstract":"<p><p>This study explores the efficacy of wavelet coefficients in improving speech separation models for real-world scenarios, in which performance often degrades compared to ideal conditions. Feature distortion in practical environments hampers speaker discrimination, driving the quest for more robust features beyond traditional inputs. Whereas wavelet transform (WT) is typically employed in classification tasks, this research uncovers its potential in speech separation. By integrating discrete wavelet and wavelet packets during model training, the study evaluates the impact of WT on enhancing speech separation applications. Additionally, it addresses the challenge of incorporating wavelet scattering (WS), known for lacking an exact inverse transform, into speech separation tasks. To overcome this limitation, wavelet scattering coefficients are integrated into the loss function, expanding its utility. Results demonstrate the superior performance and resilience of wavelet-based models in noisy conditions. Particularly, integrating WS coefficients enhances separation accuracy, surpassing other methods in key metrics, such as scale invariant-signal to distortion ratio, mean opinion score, and short time objective intelligibility, establishing wavelet coefficients as state-of-the-art solutions for speech separation in challenging acoustic environments.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"158 1","pages":"201-209"},"PeriodicalIF":2.1000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Acoustical Society of America","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1121/10.0037082","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0
Abstract
This study explores the efficacy of wavelet coefficients in improving speech separation models for real-world scenarios, in which performance often degrades compared to ideal conditions. Feature distortion in practical environments hampers speaker discrimination, driving the quest for more robust features beyond traditional inputs. Whereas wavelet transform (WT) is typically employed in classification tasks, this research uncovers its potential in speech separation. By integrating discrete wavelet and wavelet packets during model training, the study evaluates the impact of WT on enhancing speech separation applications. Additionally, it addresses the challenge of incorporating wavelet scattering (WS), known for lacking an exact inverse transform, into speech separation tasks. To overcome this limitation, wavelet scattering coefficients are integrated into the loss function, expanding its utility. Results demonstrate the superior performance and resilience of wavelet-based models in noisy conditions. Particularly, integrating WS coefficients enhances separation accuracy, surpassing other methods in key metrics, such as scale invariant-signal to distortion ratio, mean opinion score, and short time objective intelligibility, establishing wavelet coefficients as state-of-the-art solutions for speech separation in challenging acoustic environments.
期刊介绍:
Since 1929 The Journal of the Acoustical Society of America has been the leading source of theoretical and experimental research results in the broad interdisciplinary study of sound. Subject coverage includes: linear and nonlinear acoustics; aeroacoustics, underwater sound and acoustical oceanography; ultrasonics and quantum acoustics; architectural and structural acoustics and vibration; speech, music and noise; psychology and physiology of hearing; engineering acoustics, transduction; bioacoustics, animal bioacoustics.