Objective and subjective evaluation of speech enhancement methods in the UDASE task of the 7th CHiME challenge

IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Simon Leglaive , Matthieu Fraticelli , Hend ElGhazaly , Léonie Borne , Mostafa Sadeghi , Scott Wisdom , Manuel Pariente , John R. Hershey , Daniel Pressnitzer , Jon P. Barker
{"title":"Objective and subjective evaluation of speech enhancement methods in the UDASE task of the 7th CHiME challenge","authors":"Simon Leglaive ,&nbsp;Matthieu Fraticelli ,&nbsp;Hend ElGhazaly ,&nbsp;Léonie Borne ,&nbsp;Mostafa Sadeghi ,&nbsp;Scott Wisdom ,&nbsp;Manuel Pariente ,&nbsp;John R. Hershey ,&nbsp;Daniel Pressnitzer ,&nbsp;Jon P. Barker","doi":"10.1016/j.csl.2024.101685","DOIUrl":null,"url":null,"abstract":"<div><p>Supervised models for speech enhancement are trained using artificially generated mixtures of clean speech and noise signals. However, the synthetic training conditions may not accurately reflect real-world conditions encountered during testing. This discrepancy can result in poor performance when the test domain significantly differs from the synthetic training domain. To tackle this issue, the UDASE task of the 7th CHiME challenge aimed to leverage real-world noisy speech recordings from the test domain for unsupervised domain adaptation of speech enhancement models. Specifically, this test domain corresponds to the CHiME-5 dataset, characterized by real multi-speaker and conversational speech recordings made in noisy and reverberant domestic environments, for which ground-truth clean speech signals are not available. In this paper, we present the objective and subjective evaluations of the systems that were submitted to the CHiME-7 UDASE task, and we provide an analysis of the results. This analysis reveals a limited correlation between subjective ratings and several supervised nonintrusive performance metrics recently proposed for speech enhancement. Conversely, the results suggest that more traditional intrusive objective metrics can be used for in-domain performance evaluation using the reverberant LibriCHiME-5 dataset developed for the challenge. The subjective evaluation indicates that all systems successfully reduced the background noise, but always at the expense of increased distortion. Out of the four speech enhancement methods evaluated subjectively, only one demonstrated an improvement in overall quality compared to the unprocessed noisy speech, highlighting the difficulty of the task. The tools and audio material created for the CHiME-7 UDASE task are shared with the community.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":3.1000,"publicationDate":"2024-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000688/pdfft?md5=8f9da64ecc09fa13d3d77b048c8fa3ae&pid=1-s2.0-S0885230824000688-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0885230824000688","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Supervised models for speech enhancement are trained using artificially generated mixtures of clean speech and noise signals. However, the synthetic training conditions may not accurately reflect real-world conditions encountered during testing. This discrepancy can result in poor performance when the test domain significantly differs from the synthetic training domain. To tackle this issue, the UDASE task of the 7th CHiME challenge aimed to leverage real-world noisy speech recordings from the test domain for unsupervised domain adaptation of speech enhancement models. Specifically, this test domain corresponds to the CHiME-5 dataset, characterized by real multi-speaker and conversational speech recordings made in noisy and reverberant domestic environments, for which ground-truth clean speech signals are not available. In this paper, we present the objective and subjective evaluations of the systems that were submitted to the CHiME-7 UDASE task, and we provide an analysis of the results. This analysis reveals a limited correlation between subjective ratings and several supervised nonintrusive performance metrics recently proposed for speech enhancement. Conversely, the results suggest that more traditional intrusive objective metrics can be used for in-domain performance evaluation using the reverberant LibriCHiME-5 dataset developed for the challenge. The subjective evaluation indicates that all systems successfully reduced the background noise, but always at the expense of increased distortion. Out of the four speech enhancement methods evaluated subjectively, only one demonstrated an improvement in overall quality compared to the unprocessed noisy speech, highlighting the difficulty of the task. The tools and audio material created for the CHiME-7 UDASE task are shared with the community.

第 7 届 CHiME 挑战赛 UDASE 任务中对语音增强方法的客观和主观评估
用于语音增强的监督模型是利用人工生成的干净语音和噪声信号混合物进行训练的。然而,合成训练条件可能无法准确反映测试过程中遇到的实际情况。当测试域与合成训练域有显著差异时,这种差异会导致性能低下。为了解决这个问题,第七届 CHiME 挑战赛的 UDASE 任务旨在利用来自测试域的真实世界噪声语音记录,对语音增强模型进行无监督域适应。具体来说,该测试域与 CHiME-5 数据集相对应,其特点是在嘈杂和混响的家庭环境中录制的真实多讲话者会话语音记录,而这些记录无法获得地面真实的干净语音信号。在本文中,我们介绍了提交给 CHiME-7 UDASE 任务的系统的客观和主观评价,并对结果进行了分析。分析表明,主观评价与最近提出的几种用于语音增强的有监督非侵入式性能指标之间的相关性有限。相反,结果表明,使用为挑战赛开发的混响LibriCHiME-5数据集,更传统的侵入式客观指标可用于域内性能评估。主观评估结果表明,所有系统都成功降低了背景噪声,但总是以增加失真为代价。在主观评估的四种语音增强方法中,只有一种与未经处理的噪声语音相比,整体质量有所提高,这凸显了这项任务的难度。为 CHiME-7 UDASE 任务创建的工具和音频资料已与社区共享。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computer Speech and Language
Computer Speech and Language 工程技术-计算机:人工智能
CiteScore
11.30
自引率
4.70%
发文量
80
审稿时长
22.9 weeks
期刊介绍: Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language. The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信