噪声模型转移:抗非平稳噪声鲁棒性的新方法

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-10-01 DOI:10.1109/TASL.2013.2272513

Takuya Yoshioka, T. Nakatani

{"title":"噪声模型转移:抗非平稳噪声鲁棒性的新方法","authors":"Takuya Yoshioka, T. Nakatani","doi":"10.1109/TASL.2013.2272513","DOIUrl":null,"url":null,"abstract":"This paper proposes an approach, called noise model transfer (NMT), for estimating the rapidly changing parameter values of a feature-domain noise model, which can be used to enhance feature vectors corrupted by highly nonstationary noise. Unlike conventional methods, the proposed approach can exploit both observed feature vectors, representing spectral envelopes, and other signal properties that are usually discarded during feature extraction but that are useful for separating nonstationary noise from speech. Specifically, we assume the availability of a noise power spectrum estimator that can capture rapid changes in noise characteristics by leveraging such signal properties. NMT determines the optimal transformation from the estimated noise power spectra into the feature-domain noise model parameter values in the sense of maximum likelihood. NMT is successfully applied to meeting speech recognition, where the main noise sources are competing talkers; and reverberant speech recognition, where the late reverberation is regarded as highly nonstationary additive noise.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":"21 1","pages":"2182-2192"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2272513","citationCount":"11","resultStr":"{\"title\":\"Noise Model Transfer: Novel Approach to Robustness Against Nonstationary Noise\",\"authors\":\"Takuya Yoshioka, T. Nakatani\",\"doi\":\"10.1109/TASL.2013.2272513\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes an approach, called noise model transfer (NMT), for estimating the rapidly changing parameter values of a feature-domain noise model, which can be used to enhance feature vectors corrupted by highly nonstationary noise. Unlike conventional methods, the proposed approach can exploit both observed feature vectors, representing spectral envelopes, and other signal properties that are usually discarded during feature extraction but that are useful for separating nonstationary noise from speech. Specifically, we assume the availability of a noise power spectrum estimator that can capture rapid changes in noise characteristics by leveraging such signal properties. NMT determines the optimal transformation from the estimated noise power spectra into the feature-domain noise model parameter values in the sense of maximum likelihood. NMT is successfully applied to meeting speech recognition, where the main noise sources are competing talkers; and reverberant speech recognition, where the late reverberation is regarded as highly nonstationary additive noise.\",\"PeriodicalId\":55014,\"journal\":{\"name\":\"IEEE Transactions on Audio Speech and Language Processing\",\"volume\":\"21 1\",\"pages\":\"2182-2192\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1109/TASL.2013.2272513\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Audio Speech and Language Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TASL.2013.2272513\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Audio Speech and Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TASL.2013.2272513","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

摘要

本文提出了一种用于估计特征域噪声模型的快速变化参数值的方法，该方法可用于增强被高度非平稳噪声破坏的特征向量。与传统方法不同，本文提出的方法可以利用观察到的特征向量(表示频谱包络)和其他信号特性，这些特性通常在特征提取过程中被丢弃，但对于从语音中分离非平稳噪声很有用。具体来说，我们假设噪声功率谱估计器的可用性可以通过利用这些信号特性来捕获噪声特性的快速变化。在极大似然意义上，NMT确定了从估计的噪声功率谱到特征域噪声模型参数值的最优转换。NMT成功地应用于会议语音识别，其中主要噪声源是相互竞争的说话者;在混响语音识别中，后期混响被视为高度非平稳的加性噪声。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Noise Model Transfer: Novel Approach to Robustness Against Nonstationary Noise

This paper proposes an approach, called noise model transfer (NMT), for estimating the rapidly changing parameter values of a feature-domain noise model, which can be used to enhance feature vectors corrupted by highly nonstationary noise. Unlike conventional methods, the proposed approach can exploit both observed feature vectors, representing spectral envelopes, and other signal properties that are usually discarded during feature extraction but that are useful for separating nonstationary noise from speech. Specifically, we assume the availability of a noise power spectrum estimator that can capture rapid changes in noise characteristics by leveraging such signal properties. NMT determines the optimal transformation from the estimated noise power spectra into the feature-domain noise model parameter values in the sense of maximum likelihood. NMT is successfully applied to meeting speech recognition, where the main noise sources are competing talkers; and reverberant speech recognition, where the late reverberation is regarded as highly nonstationary additive noise.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Audio Speech and Language Processing 工程技术-工程：电子与电气

自引率

0.00%

发文量

审稿时长

24.0 months

期刊介绍： The IEEE Transactions on Audio, Speech and Language Processing covers the sciences, technologies and applications relating to the analysis, coding, enhancement, recognition and synthesis of audio, music, speech and language. In particular, audio processing also covers auditory modeling, acoustic modeling and source separation. Speech processing also covers speech production and perception, adaptation, lexical modeling and speaker recognition. Language processing also covers spoken language understanding, translation, summarization, mining, general language modeling, as well as spoken dialog systems.