Enhancements in Assamese spoken query system: Enabling background noise suppression and flexible queries

2016 Twenty Second National Conference on Communication (NCC) Pub Date : 2016-03-04 DOI:10.1109/NCC.2016.7561193

Abhishek Dey, S. Shahnawazuddin, K. Deepak, Siddika Imani, S. Prasanna, R. Sinha

{"title":"Enhancements in Assamese spoken query system: Enabling background noise suppression and flexible queries","authors":"Abhishek Dey, S. Shahnawazuddin, K. Deepak, Siddika Imani, S. Prasanna, R. Sinha","doi":"10.1109/NCC.2016.7561193","DOIUrl":null,"url":null,"abstract":"In the work presented in this paper, the recent improvements incorporated in the earlier developed Assamese spoken query (SQ) system for accessing the price of agricultural commodities are discussed. The developed SQ system consists of interactive voice response (IVR) and automatic speech recognition (ASR) modules. These are developed using open source resources. The speech data used for developing the ASR system was collected in the field conditions, thus contained significantly high level of background noise. On account of the background noise, the recognition performance of earlier version of the SQ system was severely affected. In order to deal with that, a front-end noise suppression module-based on zero frequency filtering has been added in the current version. Furthermore, we have also incorporated the subspace Gaussian mixture (SGMM) and deep neural network (DNN)-based acoustic modeling approaches. These techniques are found to be more effective than the Gaussian mixture model (GMM)-based approach which was employed in the previous version. The combination of noise removal and DNN-based acoustic modeling is found to result in a relative improvement of almost 32% in word error rate in comparison to the earlier reported GMM-HMM-based ASR system. The earlier SQ system was designed expecting the users' queries in form of isolated words only and, therefore, a high degraded recognition performance was noted whenever the queries were in the form of continuous sentences. In order to overcome that, we present a simple technique exploiting the inherent patterns in the user queries. These patterns are then incorporated in the employed language model. The modified language model is observed to result in significant improvements in the recognition performances in case of continuous queries.","PeriodicalId":279637,"journal":{"name":"2016 Twenty Second National Conference on Communication (NCC)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 Twenty Second National Conference on Communication (NCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NCC.2016.7561193","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

In the work presented in this paper, the recent improvements incorporated in the earlier developed Assamese spoken query (SQ) system for accessing the price of agricultural commodities are discussed. The developed SQ system consists of interactive voice response (IVR) and automatic speech recognition (ASR) modules. These are developed using open source resources. The speech data used for developing the ASR system was collected in the field conditions, thus contained significantly high level of background noise. On account of the background noise, the recognition performance of earlier version of the SQ system was severely affected. In order to deal with that, a front-end noise suppression module-based on zero frequency filtering has been added in the current version. Furthermore, we have also incorporated the subspace Gaussian mixture (SGMM) and deep neural network (DNN)-based acoustic modeling approaches. These techniques are found to be more effective than the Gaussian mixture model (GMM)-based approach which was employed in the previous version. The combination of noise removal and DNN-based acoustic modeling is found to result in a relative improvement of almost 32% in word error rate in comparison to the earlier reported GMM-HMM-based ASR system. The earlier SQ system was designed expecting the users' queries in form of isolated words only and, therefore, a high degraded recognition performance was noted whenever the queries were in the form of continuous sentences. In order to overcome that, we present a simple technique exploiting the inherent patterns in the user queries. These patterns are then incorporated in the employed language model. The modified language model is observed to result in significant improvements in the recognition performances in case of continuous queries.

查看原文本刊更多论文

阿萨姆语语音查询系统的增强:支持背景噪声抑制和灵活查询

在本文中提出的工作，最近的改进纳入早期开发的阿萨姆语口语查询(SQ)系统访问农产品的价格进行了讨论。所开发的SQ系统由交互式语音应答(IVR)和自动语音识别(ASR)模块组成。这些都是使用开源资源开发的。用于开发ASR系统的语音数据是在现场条件下收集的，因此包含明显高水平的背景噪声。由于背景噪声的影响，早期系统的识别性能受到严重影响。为了解决这一问题，在当前版本中增加了基于零频率滤波的前端噪声抑制模块。此外，我们还结合了子空间高斯混合(SGMM)和基于深度神经网络(DNN)的声学建模方法。这些技术被发现比以前版本中采用的基于高斯混合模型(GMM)的方法更有效。研究发现，与之前报道的基于gmm - hmm的ASR系统相比，噪声去除和基于dnn的声学建模相结合可以使单词错误率相对提高近32%。早期的SQ系统设计时只期望用户的查询以孤立词的形式出现，因此，每当查询以连续句子的形式出现时，识别性能就会下降。为了克服这个问题，我们提出了一种利用用户查询中的固有模式的简单技术。然后将这些模式合并到所使用的语言模型中。在连续查询的情况下，改进的语言模型显著提高了识别性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 Twenty Second National Conference on Communication (NCC)

自引率

0.00%

发文量