基于听觉的神经网络语音识别小波包滤波组

15th International Conference on Advanced Computing and Communications (ADCOM 2007) Pub Date : 2007-12-18 DOI:10.1109/ADCOM.2007.47

R. Gandhiraj, P. S. Sathidevi

{"title":"基于听觉的神经网络语音识别小波包滤波组","authors":"R. Gandhiraj, P. S. Sathidevi","doi":"10.1109/ADCOM.2007.47","DOIUrl":null,"url":null,"abstract":"A major problem of most speech recognition systems is their unsatisfactory robustness in noise. Human inner ear based `feature extraction' leads to very robust speech understanding in noise. This `Model of Auditory Periphery' is acting as front-end model of this speech recognition process. This paper describes two quantitative models for signal processing in auditory system (i) Gamma Tone Filter Bank (GTFB) and (ii) Wavelet Packet (WP) as front- ends for robust speech recognition. The auditory feature vectors had been used to train neural network. The classification of the feature vectors was done by the neural network using Back Propagation (BP) algorithm. The system performance was measured by recognition rate with various signal-to- noise ratios over -10 to 10 dB. The proposed system's performance was compared with various types of front-ends and recognition methods such as auditory features with Hidden Markov Model (HMM) & Layered Neural Network (LRNN), auditory features with Mel Frequency Cepstral Coefficient (MFCC) & LRNN and vocal tract model: MFCC & HMM, Dynamic time warping (DTW). The performances of proposed models with gamma tone filter bank and wavelet packet as front-ends were also compared. It had been identified that proposed system with wavelet packet as front-end and Back Propagation Neural Network (BPNN) as the recognition method is having good recognition rate over -10 to 10 dB. Both speaker independent and speaker dependent recognition systems had been designed, implemented and tested. Key words: auditory-based, speech recognition, wavelet packet, neural network","PeriodicalId":185608,"journal":{"name":"15th International Conference on Advanced Computing and Communications (ADCOM 2007)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"30","resultStr":"{\"title\":\"Auditory-Based Wavelet Packet Filterbank for Speech Recognition Using Neural Network\",\"authors\":\"R. Gandhiraj, P. S. Sathidevi\",\"doi\":\"10.1109/ADCOM.2007.47\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A major problem of most speech recognition systems is their unsatisfactory robustness in noise. Human inner ear based `feature extraction' leads to very robust speech understanding in noise. This `Model of Auditory Periphery' is acting as front-end model of this speech recognition process. This paper describes two quantitative models for signal processing in auditory system (i) Gamma Tone Filter Bank (GTFB) and (ii) Wavelet Packet (WP) as front- ends for robust speech recognition. The auditory feature vectors had been used to train neural network. The classification of the feature vectors was done by the neural network using Back Propagation (BP) algorithm. The system performance was measured by recognition rate with various signal-to- noise ratios over -10 to 10 dB. The proposed system's performance was compared with various types of front-ends and recognition methods such as auditory features with Hidden Markov Model (HMM) & Layered Neural Network (LRNN), auditory features with Mel Frequency Cepstral Coefficient (MFCC) & LRNN and vocal tract model: MFCC & HMM, Dynamic time warping (DTW). The performances of proposed models with gamma tone filter bank and wavelet packet as front-ends were also compared. It had been identified that proposed system with wavelet packet as front-end and Back Propagation Neural Network (BPNN) as the recognition method is having good recognition rate over -10 to 10 dB. Both speaker independent and speaker dependent recognition systems had been designed, implemented and tested. Key words: auditory-based, speech recognition, wavelet packet, neural network\",\"PeriodicalId\":185608,\"journal\":{\"name\":\"15th International Conference on Advanced Computing and Communications (ADCOM 2007)\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-12-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"30\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"15th International Conference on Advanced Computing and Communications (ADCOM 2007)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ADCOM.2007.47\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"15th International Conference on Advanced Computing and Communications (ADCOM 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ADCOM.2007.47","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 30

摘要

大多数语音识别系统的一个主要问题是它们在噪声中的鲁棒性不理想。基于人类内耳的“特征提取”可以在噪声中实现非常稳健的语音理解。这个“听觉外围模型”是这个语音识别过程的前端模型。本文描述了听觉系统信号处理的两种定量模型(1)伽玛音调滤波器组(GTFB)和(2)小波包(WP)作为鲁棒语音识别的前端。利用听觉特征向量对神经网络进行训练。神经网络采用BP算法对特征向量进行分类。在-10 ~ 10db的不同信噪比下，通过识别率来衡量系统的性能。将该系统的性能与各种前端识别方法进行了比较，如隐马尔可夫模型(HMM)和分层神经网络(LRNN)的听觉特征，Mel频率倒谱系数(MFCC)和LRNN的听觉特征以及声道模型:MFCC和HMM，动态时间翘曲(DTW)。并比较了以伽马音滤波器组和小波包为前端的模型的性能。结果表明，以小波包为前端，反向传播神经网络(BPNN)为识别方法的系统在-10 ~ 10 dB范围内具有良好的识别率。独立说话人和依赖说话人的识别系统已经设计，实现和测试。关键词:基于听觉的语音识别小波包神经网络

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Auditory-Based Wavelet Packet Filterbank for Speech Recognition Using Neural Network

A major problem of most speech recognition systems is their unsatisfactory robustness in noise. Human inner ear based `feature extraction' leads to very robust speech understanding in noise. This `Model of Auditory Periphery' is acting as front-end model of this speech recognition process. This paper describes two quantitative models for signal processing in auditory system (i) Gamma Tone Filter Bank (GTFB) and (ii) Wavelet Packet (WP) as front- ends for robust speech recognition. The auditory feature vectors had been used to train neural network. The classification of the feature vectors was done by the neural network using Back Propagation (BP) algorithm. The system performance was measured by recognition rate with various signal-to- noise ratios over -10 to 10 dB. The proposed system's performance was compared with various types of front-ends and recognition methods such as auditory features with Hidden Markov Model (HMM) & Layered Neural Network (LRNN), auditory features with Mel Frequency Cepstral Coefficient (MFCC) & LRNN and vocal tract model: MFCC & HMM, Dynamic time warping (DTW). The performances of proposed models with gamma tone filter bank and wavelet packet as front-ends were also compared. It had been identified that proposed system with wavelet packet as front-end and Back Propagation Neural Network (BPNN) as the recognition method is having good recognition rate over -10 to 10 dB. Both speaker independent and speaker dependent recognition systems had been designed, implemented and tested. Key words: auditory-based, speech recognition, wavelet packet, neural network

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

15th International Conference on Advanced Computing and Communications (ADCOM 2007)

自引率

0.00%

发文量