基于LPC和ANN的孟加拉语语音识别系统

2009 Seventh International Conference on Advances in Pattern Recognition Pub Date : 2009-02-04 DOI:10.1109/ICAPR.2009.80

Anupama Paul, Dipankar Das, M. Kamal

{"title":"基于LPC和ANN的孟加拉语语音识别系统","authors":"Anupama Paul, Dipankar Das, M. Kamal","doi":"10.1109/ICAPR.2009.80","DOIUrl":null,"url":null,"abstract":"This paper presents the Bangla speech recognition system. Bangla speech recognition system is divided mainly into two major parts. The first part is speech signal processing and the second part is speech pattern recognition technique. The speech processing stage consists of speech starting and end point detection, windowing, filtering, calculating the Linear Predictive Coding(LPC) and Cepstral Coefficients and finally constructing the codebook by vector quantization. The second part consists of pattern recognition system using Artificial Neural Network(ANN). Speech signals are recorded using an audio wave recorder in the normal room environment. The recorded speech signal is passed through the speech starting and end-point detection algorithm to detect the presence of the speech signal and remove the silence and pauses portions of the signals. The resulting signal is then filtered for the removal of unwanted background noise from the speech signals. The filtered signal is then windowed ensuring half frame overlap. After windowing, the speech signal is then subjected to calculate the LPC coefficient and Cepstral coefficient. The feature extractor uses a standard LPC Cepstrum coder, which converts the incoming speech signal into LPC Cepstrum feature space. The Self Organizing Map(SOM) Neural Network makes each variable length LPC trajectory of an isolated word into a fixed length LPC trajectory and thereby making the fixed length feature vector, to be fed into to the recognizer. The structures of the neural network is designed with Multi Layer Perceptron approach and tested with 3, 4, 5 hidden layers using the Transfer functions of Tanh Sigmoid for the Bangla speech recognition system. Comparison among different structures of Neural Networks conducted here for a better understanding of the problem and its possible solutions.","PeriodicalId":443926,"journal":{"name":"2009 Seventh International Conference on Advances in Pattern Recognition","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"91","resultStr":"{\"title\":\"Bangla Speech Recognition System Using LPC and ANN\",\"authors\":\"Anupama Paul, Dipankar Das, M. Kamal\",\"doi\":\"10.1109/ICAPR.2009.80\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents the Bangla speech recognition system. Bangla speech recognition system is divided mainly into two major parts. The first part is speech signal processing and the second part is speech pattern recognition technique. The speech processing stage consists of speech starting and end point detection, windowing, filtering, calculating the Linear Predictive Coding(LPC) and Cepstral Coefficients and finally constructing the codebook by vector quantization. The second part consists of pattern recognition system using Artificial Neural Network(ANN). Speech signals are recorded using an audio wave recorder in the normal room environment. The recorded speech signal is passed through the speech starting and end-point detection algorithm to detect the presence of the speech signal and remove the silence and pauses portions of the signals. The resulting signal is then filtered for the removal of unwanted background noise from the speech signals. The filtered signal is then windowed ensuring half frame overlap. After windowing, the speech signal is then subjected to calculate the LPC coefficient and Cepstral coefficient. The feature extractor uses a standard LPC Cepstrum coder, which converts the incoming speech signal into LPC Cepstrum feature space. The Self Organizing Map(SOM) Neural Network makes each variable length LPC trajectory of an isolated word into a fixed length LPC trajectory and thereby making the fixed length feature vector, to be fed into to the recognizer. The structures of the neural network is designed with Multi Layer Perceptron approach and tested with 3, 4, 5 hidden layers using the Transfer functions of Tanh Sigmoid for the Bangla speech recognition system. Comparison among different structures of Neural Networks conducted here for a better understanding of the problem and its possible solutions.\",\"PeriodicalId\":443926,\"journal\":{\"name\":\"2009 Seventh International Conference on Advances in Pattern Recognition\",\"volume\":\"70 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-02-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"91\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 Seventh International Conference on Advances in Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAPR.2009.80\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 Seventh International Conference on Advances in Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAPR.2009.80","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 91

摘要

本文介绍了孟加拉语语音识别系统。孟加拉语语音识别系统主要分为两大部分。第一部分是语音信号处理，第二部分是语音模式识别技术。语音处理阶段包括语音起始点和终点检测、加窗、滤波、计算线性预测编码(LPC)和倒谱系数，最后通过矢量量化构造码本。第二部分是基于人工神经网络(ANN)的模式识别系统。语音信号在正常的房间环境中使用音频波记录器记录。所记录的语音信号通过所述语音起点和终点检测算法，以检测所述语音信号的存在并去除所述信号的沉默和暂停部分。然后对产生的信号进行滤波，以从语音信号中去除不需要的背景噪声。然后对滤波后的信号加窗，确保半帧重叠。加窗后，对语音信号进行LPC系数和倒谱系数的计算。特征提取器采用标准的LPC倒频谱编码器，将输入语音信号转换为LPC倒频谱特征空间。自组织映射(SOM)神经网络将孤立词的每个变长LPC轨迹转化为固定长度的LPC轨迹，从而形成固定长度的特征向量，输入到识别器中。采用多层感知器方法设计神经网络结构，并利用Tanh Sigmoid传递函数对孟加拉语语音识别系统进行了3、4、5个隐藏层的测试。这里对不同结构的神经网络进行比较，以便更好地理解问题及其可能的解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Bangla Speech Recognition System Using LPC and ANN

This paper presents the Bangla speech recognition system. Bangla speech recognition system is divided mainly into two major parts. The first part is speech signal processing and the second part is speech pattern recognition technique. The speech processing stage consists of speech starting and end point detection, windowing, filtering, calculating the Linear Predictive Coding(LPC) and Cepstral Coefficients and finally constructing the codebook by vector quantization. The second part consists of pattern recognition system using Artificial Neural Network(ANN). Speech signals are recorded using an audio wave recorder in the normal room environment. The recorded speech signal is passed through the speech starting and end-point detection algorithm to detect the presence of the speech signal and remove the silence and pauses portions of the signals. The resulting signal is then filtered for the removal of unwanted background noise from the speech signals. The filtered signal is then windowed ensuring half frame overlap. After windowing, the speech signal is then subjected to calculate the LPC coefficient and Cepstral coefficient. The feature extractor uses a standard LPC Cepstrum coder, which converts the incoming speech signal into LPC Cepstrum feature space. The Self Organizing Map(SOM) Neural Network makes each variable length LPC trajectory of an isolated word into a fixed length LPC trajectory and thereby making the fixed length feature vector, to be fed into to the recognizer. The structures of the neural network is designed with Multi Layer Perceptron approach and tested with 3, 4, 5 hidden layers using the Transfer functions of Tanh Sigmoid for the Bangla speech recognition system. Comparison among different structures of Neural Networks conducted here for a better understanding of the problem and its possible solutions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2009 Seventh International Conference on Advances in Pattern Recognition

自引率

0.00%

发文量