无并行语料库的数据驱动语音增强对抗训练

T. Higuchi, K. Kinoshita, Marc Delcroix, T. Nakatani
{"title":"无并行语料库的数据驱动语音增强对抗训练","authors":"T. Higuchi, K. Kinoshita, Marc Delcroix, T. Nakatani","doi":"10.1109/ASRU.2017.8268914","DOIUrl":null,"url":null,"abstract":"This paper describes a way of performing data-driven speech enhancement for noise robust automatic speech recognition (ASR), where we train a model for speech enhancement without a parallel corpus. Data-driven speech enhancement with deep models has recently been investigated and proven to be a promising approach for ASR. However, for model training, we need a parallel corpus consisting of noisy speech signals and corresponding clean speech signals for supervision. Therefore a deep model can be trained only with a simulated dataset, and we cannot take advantage of a large number of noisy recordings that do not have corresponding clean speech signals. As a first step towards model training without supervision, this paper proposes a novel approach introducing adversarial training for a time-frequency mask estimator. Our cost function for model training is defined by discriminators instead of by using the distance between the model outputs and the supervision. The discriminators distinguish between true signals and enhanced signals obtained with time-frequency masks estimated with a mask estimator. The mask estimator is trained to cheat the discriminators, which enables the mask estimator to estimate the appropriate time-frequency masks without a parallel corpus. The enhanced signal is finally obtained with masking-based beamforming. Experimental results show that, even without exploiting parallel data, our speech enhancement approach achieves improved ASR performance compared with results obtained with unprocessed signals and achieves comparable ASR performance to that obtained with a model trained with a parallel corpus based on a minimum mean squared error (MMSE) criterion.","PeriodicalId":290868,"journal":{"name":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"151 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":"{\"title\":\"Adversarial training for data-driven speech enhancement without parallel corpus\",\"authors\":\"T. Higuchi, K. Kinoshita, Marc Delcroix, T. Nakatani\",\"doi\":\"10.1109/ASRU.2017.8268914\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describes a way of performing data-driven speech enhancement for noise robust automatic speech recognition (ASR), where we train a model for speech enhancement without a parallel corpus. Data-driven speech enhancement with deep models has recently been investigated and proven to be a promising approach for ASR. However, for model training, we need a parallel corpus consisting of noisy speech signals and corresponding clean speech signals for supervision. Therefore a deep model can be trained only with a simulated dataset, and we cannot take advantage of a large number of noisy recordings that do not have corresponding clean speech signals. As a first step towards model training without supervision, this paper proposes a novel approach introducing adversarial training for a time-frequency mask estimator. Our cost function for model training is defined by discriminators instead of by using the distance between the model outputs and the supervision. The discriminators distinguish between true signals and enhanced signals obtained with time-frequency masks estimated with a mask estimator. The mask estimator is trained to cheat the discriminators, which enables the mask estimator to estimate the appropriate time-frequency masks without a parallel corpus. The enhanced signal is finally obtained with masking-based beamforming. Experimental results show that, even without exploiting parallel data, our speech enhancement approach achieves improved ASR performance compared with results obtained with unprocessed signals and achieves comparable ASR performance to that obtained with a model trained with a parallel corpus based on a minimum mean squared error (MMSE) criterion.\",\"PeriodicalId\":290868,\"journal\":{\"name\":\"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"volume\":\"151 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"23\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU.2017.8268914\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2017.8268914","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 23

摘要

本文描述了一种对噪声鲁棒自动语音识别(ASR)进行数据驱动语音增强的方法,其中我们训练了一个没有并行语料库的语音增强模型。基于深度模型的数据驱动语音增强最近得到了研究,并被证明是一种很有前途的ASR方法。然而,对于模型训练,我们需要一个由噪声语音信号和相应的干净语音信号组成的并行语料库进行监督。因此,深度模型只能用模拟数据集来训练,我们无法利用大量没有相应干净语音信号的噪声录音。作为无监督模型训练的第一步,本文提出了一种引入时频掩膜估计器对抗训练的新方法。我们的模型训练成本函数是由鉴别器定义的,而不是使用模型输出和监督之间的距离。鉴别器区分真实信号和增强信号,这些信号是由掩模估计器估计的时频掩模得到的。通过训练掩模估计器来欺骗鉴别器,使得掩模估计器能够在不需要并行语料库的情况下估计出合适的时频掩模。最后利用掩模波束形成技术获得增强信号。实验结果表明,即使不利用并行数据,我们的语音增强方法也比未处理信号获得的结果更好,并且与基于最小均方误差(MMSE)标准的并行语料库训练的模型获得的ASR性能相当。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Adversarial training for data-driven speech enhancement without parallel corpus
This paper describes a way of performing data-driven speech enhancement for noise robust automatic speech recognition (ASR), where we train a model for speech enhancement without a parallel corpus. Data-driven speech enhancement with deep models has recently been investigated and proven to be a promising approach for ASR. However, for model training, we need a parallel corpus consisting of noisy speech signals and corresponding clean speech signals for supervision. Therefore a deep model can be trained only with a simulated dataset, and we cannot take advantage of a large number of noisy recordings that do not have corresponding clean speech signals. As a first step towards model training without supervision, this paper proposes a novel approach introducing adversarial training for a time-frequency mask estimator. Our cost function for model training is defined by discriminators instead of by using the distance between the model outputs and the supervision. The discriminators distinguish between true signals and enhanced signals obtained with time-frequency masks estimated with a mask estimator. The mask estimator is trained to cheat the discriminators, which enables the mask estimator to estimate the appropriate time-frequency masks without a parallel corpus. The enhanced signal is finally obtained with masking-based beamforming. Experimental results show that, even without exploiting parallel data, our speech enhancement approach achieves improved ASR performance compared with results obtained with unprocessed signals and achieves comparable ASR performance to that obtained with a model trained with a parallel corpus based on a minimum mean squared error (MMSE) criterion.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信