基于人工神经网络的恶劣条件下鲁棒语音活动检测

T. V. Pham, Chien T. Tang, M. Stadtschnitzer
{"title":"基于人工神经网络的恶劣条件下鲁棒语音活动检测","authors":"T. V. Pham, Chien T. Tang, M. Stadtschnitzer","doi":"10.1109/RIVF.2009.5174662","DOIUrl":null,"url":null,"abstract":"We present an approach to model-based voice activity detection (VAD) for harsh environments. By using mel-frequency cepstral coefficients feature extracted from clean and noisy speech samples, an artificial neural network is trained optimally in order to provide a reliable model. There are three main aspects to this study: First, in addition to the developed model, recent state-of-the-art VAD methods are analyzed extensively. Second, we present an optimization procedure of neural network training, including evaluation of trained network performance with proper measures. Third, a large assortment of empirical results on the noisy TIMIT and SNOW corpuses including different types of noise at different signal-to-noise ratios is provided. We evaluate the built VAD model on the noisy corpuses and compare against the state-of-the-art VAD methods such as the ITU-T Rec. G. 729 Annex B, the ETSI AFE ES 202 050, and recently promising VAD algorithms. Results show that: (i) the proposed neural network classifier employing MFCC feature provides robustly high scores under different noisy conditions; (ii) the invented model is superior to other VAD methods in terms of various classification measures; (iii) the robustness of the developed VAD algorithm is still hold in the case of testing it with the completely mismatched environment.","PeriodicalId":243397,"journal":{"name":"2009 IEEE-RIVF International Conference on Computing and Communication Technologies","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":"{\"title\":\"Using Artificial Neural Network for Robust Voice Activity Detection Under Adverse Conditions\",\"authors\":\"T. V. Pham, Chien T. Tang, M. Stadtschnitzer\",\"doi\":\"10.1109/RIVF.2009.5174662\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present an approach to model-based voice activity detection (VAD) for harsh environments. By using mel-frequency cepstral coefficients feature extracted from clean and noisy speech samples, an artificial neural network is trained optimally in order to provide a reliable model. There are three main aspects to this study: First, in addition to the developed model, recent state-of-the-art VAD methods are analyzed extensively. Second, we present an optimization procedure of neural network training, including evaluation of trained network performance with proper measures. Third, a large assortment of empirical results on the noisy TIMIT and SNOW corpuses including different types of noise at different signal-to-noise ratios is provided. We evaluate the built VAD model on the noisy corpuses and compare against the state-of-the-art VAD methods such as the ITU-T Rec. G. 729 Annex B, the ETSI AFE ES 202 050, and recently promising VAD algorithms. Results show that: (i) the proposed neural network classifier employing MFCC feature provides robustly high scores under different noisy conditions; (ii) the invented model is superior to other VAD methods in terms of various classification measures; (iii) the robustness of the developed VAD algorithm is still hold in the case of testing it with the completely mismatched environment.\",\"PeriodicalId\":243397,\"journal\":{\"name\":\"2009 IEEE-RIVF International Conference on Computing and Communication Technologies\",\"volume\":\"70 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-07-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"23\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 IEEE-RIVF International Conference on Computing and Communication Technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/RIVF.2009.5174662\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE-RIVF International Conference on Computing and Communication Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RIVF.2009.5174662","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 23

摘要

我们提出了一种针对恶劣环境的基于模型的语音活动检测(VAD)方法。利用从干净的和有噪声的语音样本中提取的mel频率倒谱系数特征,对人工神经网络进行最优训练,以提供可靠的模型。本研究主要有三个方面:首先,除了开发的模型外,还广泛分析了最近最先进的VAD方法。其次,我们提出了神经网络训练的优化过程,包括用适当的度量来评估训练后的网络性能。第三,提供了包含不同信噪比下不同类型噪声的TIMIT和SNOW噪声语料库的大量经验结果。我们在噪声语料库上评估了构建的VAD模型,并与最先进的VAD方法(如ITU-T Rec. G. 729附件B、ETSI AFE ES 202 050和最近有前途的VAD算法)进行了比较。结果表明:(1)采用MFCC特征的神经网络分类器在不同噪声条件下均具有较好的鲁棒性;(ii)本发明模型在各项分类措施上优于其他VAD方法;(iii)所开发的VAD算法在完全不匹配的环境下测试时仍然保持鲁棒性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Using Artificial Neural Network for Robust Voice Activity Detection Under Adverse Conditions
We present an approach to model-based voice activity detection (VAD) for harsh environments. By using mel-frequency cepstral coefficients feature extracted from clean and noisy speech samples, an artificial neural network is trained optimally in order to provide a reliable model. There are three main aspects to this study: First, in addition to the developed model, recent state-of-the-art VAD methods are analyzed extensively. Second, we present an optimization procedure of neural network training, including evaluation of trained network performance with proper measures. Third, a large assortment of empirical results on the noisy TIMIT and SNOW corpuses including different types of noise at different signal-to-noise ratios is provided. We evaluate the built VAD model on the noisy corpuses and compare against the state-of-the-art VAD methods such as the ITU-T Rec. G. 729 Annex B, the ETSI AFE ES 202 050, and recently promising VAD algorithms. Results show that: (i) the proposed neural network classifier employing MFCC feature provides robustly high scores under different noisy conditions; (ii) the invented model is superior to other VAD methods in terms of various classification measures; (iii) the robustness of the developed VAD algorithm is still hold in the case of testing it with the completely mismatched environment.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信