{"title":"基于人工神经网络的恶劣条件下鲁棒语音活动检测","authors":"T. V. Pham, Chien T. Tang, M. Stadtschnitzer","doi":"10.1109/RIVF.2009.5174662","DOIUrl":null,"url":null,"abstract":"We present an approach to model-based voice activity detection (VAD) for harsh environments. By using mel-frequency cepstral coefficients feature extracted from clean and noisy speech samples, an artificial neural network is trained optimally in order to provide a reliable model. There are three main aspects to this study: First, in addition to the developed model, recent state-of-the-art VAD methods are analyzed extensively. Second, we present an optimization procedure of neural network training, including evaluation of trained network performance with proper measures. Third, a large assortment of empirical results on the noisy TIMIT and SNOW corpuses including different types of noise at different signal-to-noise ratios is provided. We evaluate the built VAD model on the noisy corpuses and compare against the state-of-the-art VAD methods such as the ITU-T Rec. G. 729 Annex B, the ETSI AFE ES 202 050, and recently promising VAD algorithms. Results show that: (i) the proposed neural network classifier employing MFCC feature provides robustly high scores under different noisy conditions; (ii) the invented model is superior to other VAD methods in terms of various classification measures; (iii) the robustness of the developed VAD algorithm is still hold in the case of testing it with the completely mismatched environment.","PeriodicalId":243397,"journal":{"name":"2009 IEEE-RIVF International Conference on Computing and Communication Technologies","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":"{\"title\":\"Using Artificial Neural Network for Robust Voice Activity Detection Under Adverse Conditions\",\"authors\":\"T. V. Pham, Chien T. Tang, M. Stadtschnitzer\",\"doi\":\"10.1109/RIVF.2009.5174662\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present an approach to model-based voice activity detection (VAD) for harsh environments. By using mel-frequency cepstral coefficients feature extracted from clean and noisy speech samples, an artificial neural network is trained optimally in order to provide a reliable model. There are three main aspects to this study: First, in addition to the developed model, recent state-of-the-art VAD methods are analyzed extensively. Second, we present an optimization procedure of neural network training, including evaluation of trained network performance with proper measures. Third, a large assortment of empirical results on the noisy TIMIT and SNOW corpuses including different types of noise at different signal-to-noise ratios is provided. We evaluate the built VAD model on the noisy corpuses and compare against the state-of-the-art VAD methods such as the ITU-T Rec. G. 729 Annex B, the ETSI AFE ES 202 050, and recently promising VAD algorithms. Results show that: (i) the proposed neural network classifier employing MFCC feature provides robustly high scores under different noisy conditions; (ii) the invented model is superior to other VAD methods in terms of various classification measures; (iii) the robustness of the developed VAD algorithm is still hold in the case of testing it with the completely mismatched environment.\",\"PeriodicalId\":243397,\"journal\":{\"name\":\"2009 IEEE-RIVF International Conference on Computing and Communication Technologies\",\"volume\":\"70 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-07-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"23\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 IEEE-RIVF International Conference on Computing and Communication Technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/RIVF.2009.5174662\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE-RIVF International Conference on Computing and Communication Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RIVF.2009.5174662","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 23
摘要
我们提出了一种针对恶劣环境的基于模型的语音活动检测(VAD)方法。利用从干净的和有噪声的语音样本中提取的mel频率倒谱系数特征,对人工神经网络进行最优训练,以提供可靠的模型。本研究主要有三个方面:首先,除了开发的模型外,还广泛分析了最近最先进的VAD方法。其次,我们提出了神经网络训练的优化过程,包括用适当的度量来评估训练后的网络性能。第三,提供了包含不同信噪比下不同类型噪声的TIMIT和SNOW噪声语料库的大量经验结果。我们在噪声语料库上评估了构建的VAD模型,并与最先进的VAD方法(如ITU-T Rec. G. 729附件B、ETSI AFE ES 202 050和最近有前途的VAD算法)进行了比较。结果表明:(1)采用MFCC特征的神经网络分类器在不同噪声条件下均具有较好的鲁棒性;(ii)本发明模型在各项分类措施上优于其他VAD方法;(iii)所开发的VAD算法在完全不匹配的环境下测试时仍然保持鲁棒性。
Using Artificial Neural Network for Robust Voice Activity Detection Under Adverse Conditions
We present an approach to model-based voice activity detection (VAD) for harsh environments. By using mel-frequency cepstral coefficients feature extracted from clean and noisy speech samples, an artificial neural network is trained optimally in order to provide a reliable model. There are three main aspects to this study: First, in addition to the developed model, recent state-of-the-art VAD methods are analyzed extensively. Second, we present an optimization procedure of neural network training, including evaluation of trained network performance with proper measures. Third, a large assortment of empirical results on the noisy TIMIT and SNOW corpuses including different types of noise at different signal-to-noise ratios is provided. We evaluate the built VAD model on the noisy corpuses and compare against the state-of-the-art VAD methods such as the ITU-T Rec. G. 729 Annex B, the ETSI AFE ES 202 050, and recently promising VAD algorithms. Results show that: (i) the proposed neural network classifier employing MFCC feature provides robustly high scores under different noisy conditions; (ii) the invented model is superior to other VAD methods in terms of various classification measures; (iii) the robustness of the developed VAD algorithm is still hold in the case of testing it with the completely mismatched environment.