使用噪声抑制方法的鲁棒语音识别

Proceedings. IEEE International Joint Symposia on Intelligence and Systems (Cat. No.98EX174) Pub Date : 1998-03-21 DOI:10.1109/IJSIS.1998.685469

E. Khan, R. Levinson

{"title":"使用噪声抑制方法的鲁棒语音识别","authors":"E. Khan, R. Levinson","doi":"10.1109/IJSIS.1998.685469","DOIUrl":null,"url":null,"abstract":"In this paper, we explore some new approaches to improve speech recognition accuracy in a noisy environment. The key approaches taken are: (a) use no additional data (i.e. use only speakers data, no data for noise) for training and (b) no adaptation phase for noise. Instead of making adaptation in the recognition, preprocessing or both stages, we make a noise tolerant (rejection) speech recognition system where the system tries to reject noise automatically because of its inherent structure. We call our approach a noise rejection-based approach. Noise rejection is achieved by using multiple views and dynamic features of the input sequences. Multiple views exploit more information from the available data that is used for training multiple HMMs (hidden Markov models). This makes the training process simpler, faster and avoids the need to use a noise database, which is often difficult to obtain. The dynamic features (added to the HMM using vector emission probabilities) add more information about the input speech during training. Since the values of dynamic features of noise are usually much smaller than that of the speech signal, it helps reject the noise during recognition. Also, multiple views of the input sequence are applied to multiple HMMs during recognition and the outcome of the multiple HMMs are combined using maximum evidence criterion. Our tests show very encouraging results. We also incorporate higher level decision making to more judiciously combine the outcomes of the multiple HMMs to further improve the accuracy. For this, we use meta reasoning to identify the problem complexity and accordingly allocate resources.","PeriodicalId":289764,"journal":{"name":"Proceedings. IEEE International Joint Symposia on Intelligence and Systems (Cat. No.98EX174)","volume":"104 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Robust speech recognition using a noise rejection approach\",\"authors\":\"E. Khan, R. Levinson\",\"doi\":\"10.1109/IJSIS.1998.685469\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we explore some new approaches to improve speech recognition accuracy in a noisy environment. The key approaches taken are: (a) use no additional data (i.e. use only speakers data, no data for noise) for training and (b) no adaptation phase for noise. Instead of making adaptation in the recognition, preprocessing or both stages, we make a noise tolerant (rejection) speech recognition system where the system tries to reject noise automatically because of its inherent structure. We call our approach a noise rejection-based approach. Noise rejection is achieved by using multiple views and dynamic features of the input sequences. Multiple views exploit more information from the available data that is used for training multiple HMMs (hidden Markov models). This makes the training process simpler, faster and avoids the need to use a noise database, which is often difficult to obtain. The dynamic features (added to the HMM using vector emission probabilities) add more information about the input speech during training. Since the values of dynamic features of noise are usually much smaller than that of the speech signal, it helps reject the noise during recognition. Also, multiple views of the input sequence are applied to multiple HMMs during recognition and the outcome of the multiple HMMs are combined using maximum evidence criterion. Our tests show very encouraging results. We also incorporate higher level decision making to more judiciously combine the outcomes of the multiple HMMs to further improve the accuracy. For this, we use meta reasoning to identify the problem complexity and accordingly allocate resources.\",\"PeriodicalId\":289764,\"journal\":{\"name\":\"Proceedings. IEEE International Joint Symposia on Intelligence and Systems (Cat. No.98EX174)\",\"volume\":\"104 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1998-03-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. IEEE International Joint Symposia on Intelligence and Systems (Cat. No.98EX174)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IJSIS.1998.685469\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE International Joint Symposia on Intelligence and Systems (Cat. No.98EX174)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJSIS.1998.685469","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

本文探讨了在噪声环境下提高语音识别精度的一些新方法。采用的关键方法是:(a)不使用额外的数据(即只使用说话人数据，不使用噪声数据)进行训练，(b)不使用噪声适应阶段。我们不需要在识别、预处理或这两个阶段进行自适应，而是建立一个容忍噪声(拒绝)的语音识别系统，该系统由于其固有结构而试图自动拒绝噪声。我们称这种方法为基于噪声抑制的方法。噪声抑制是通过使用输入序列的多视图和动态特征来实现的。多个视图从可用数据中获取更多信息，用于训练多个hmm(隐马尔可夫模型)。这使得训练过程更简单、更快，并且避免了使用通常难以获得的噪声数据库的需要。动态特征(使用向量发射概率添加到HMM中)在训练过程中增加了更多关于输入语音的信息。由于噪声的动态特征值通常比语音信号的动态特征值小得多，这有助于在识别过程中抑制噪声。此外，在识别过程中，将输入序列的多个视图应用于多个hmm，并使用最大证据准则将多个hmm的结果组合起来。我们的测试显示出非常令人鼓舞的结果。我们还纳入了更高层次的决策制定，以更明智地组合多个hmm的结果，以进一步提高准确性。为此，我们使用元推理来识别问题的复杂性并相应地分配资源。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Robust speech recognition using a noise rejection approach

In this paper, we explore some new approaches to improve speech recognition accuracy in a noisy environment. The key approaches taken are: (a) use no additional data (i.e. use only speakers data, no data for noise) for training and (b) no adaptation phase for noise. Instead of making adaptation in the recognition, preprocessing or both stages, we make a noise tolerant (rejection) speech recognition system where the system tries to reject noise automatically because of its inherent structure. We call our approach a noise rejection-based approach. Noise rejection is achieved by using multiple views and dynamic features of the input sequences. Multiple views exploit more information from the available data that is used for training multiple HMMs (hidden Markov models). This makes the training process simpler, faster and avoids the need to use a noise database, which is often difficult to obtain. The dynamic features (added to the HMM using vector emission probabilities) add more information about the input speech during training. Since the values of dynamic features of noise are usually much smaller than that of the speech signal, it helps reject the noise during recognition. Also, multiple views of the input sequence are applied to multiple HMMs during recognition and the outcome of the multiple HMMs are combined using maximum evidence criterion. Our tests show very encouraging results. We also incorporate higher level decision making to more judiciously combine the outcomes of the multiple HMMs to further improve the accuracy. For this, we use meta reasoning to identify the problem complexity and accordingly allocate resources.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings. IEEE International Joint Symposia on Intelligence and Systems (Cat. No.98EX174)

自引率

0.00%

发文量