用于语音识别的神经网络

Proceedings of the Nineteenth National Radio Science Conference Pub Date : 2002-11-07 DOI:10.1109/NRSC.2002.1022622

S. El-Ramly, N. Abdel-Kader, R. El-Adawi

{"title":"用于语音识别的神经网络","authors":"S. El-Ramly, N. Abdel-Kader, R. El-Adawi","doi":"10.1109/NRSC.2002.1022622","DOIUrl":null,"url":null,"abstract":"Neural networks are applied to the recognition of Arabic phonemes. Time delay neural networks (TDNN) have been chosen for the problem of Arabic speech recognition because of their ability to represent relationships between acoustic events. Two Arabic categories have been chosen, nasals and voiced stops, to evaluate the performance of TDNN in Arabic speech recognition. The effect of several factors on the recognition rate have been studied: (i) the length of the analysis frame (10 ms and 20 ms); (ii) truncating or zero padding the signal versus re-sampling the signal to obtain a signal of fixed length; (iii) using a single neural network versus a separate neural network for each phoneme category; and (iv) the size of the TDNN. The length of the 20 ms analysis frame obtained higher recognition rates than the 10 ms frames. Truncating or zero padding the signal to obtain a fixed length signal gave higher recognition rates than re-sampling the signal. A 9% increase for nasals and 2.4% increase for stops, were obtained when using a separate TDNN for each phoneme category. It was also found that the size of the network depends on the complexity of the recognition problem.","PeriodicalId":231600,"journal":{"name":"Proceedings of the Nineteenth National Radio Science Conference","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":"{\"title\":\"Neural networks used for speech recognition\",\"authors\":\"S. El-Ramly, N. Abdel-Kader, R. El-Adawi\",\"doi\":\"10.1109/NRSC.2002.1022622\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Neural networks are applied to the recognition of Arabic phonemes. Time delay neural networks (TDNN) have been chosen for the problem of Arabic speech recognition because of their ability to represent relationships between acoustic events. Two Arabic categories have been chosen, nasals and voiced stops, to evaluate the performance of TDNN in Arabic speech recognition. The effect of several factors on the recognition rate have been studied: (i) the length of the analysis frame (10 ms and 20 ms); (ii) truncating or zero padding the signal versus re-sampling the signal to obtain a signal of fixed length; (iii) using a single neural network versus a separate neural network for each phoneme category; and (iv) the size of the TDNN. The length of the 20 ms analysis frame obtained higher recognition rates than the 10 ms frames. Truncating or zero padding the signal to obtain a fixed length signal gave higher recognition rates than re-sampling the signal. A 9% increase for nasals and 2.4% increase for stops, were obtained when using a separate TDNN for each phoneme category. It was also found that the size of the network depends on the complexity of the recognition problem.\",\"PeriodicalId\":231600,\"journal\":{\"name\":\"Proceedings of the Nineteenth National Radio Science Conference\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-11-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"24\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Nineteenth National Radio Science Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NRSC.2002.1022622\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Nineteenth National Radio Science Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NRSC.2002.1022622","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 24

摘要

将神经网络应用于阿拉伯语音素的识别。延时神经网络(TDNN)由于能够表示声音事件之间的关系而被选择用于阿拉伯语语音识别问题。选择了两个阿拉伯语类别，鼻音和浊音顿音，来评估TDNN在阿拉伯语语音识别中的性能。研究了几个因素对识别率的影响:(i)分析帧的长度(10 ms和20 ms);(ii)对信号进行截断或零填充，而不是对信号重新采样以获得固定长度的信号;(iii)对每个音素类别使用单个神经网络，而不是单独的神经网络;及(iv) TDNN的大小。长度为20ms的分析帧比长度为10ms的分析帧具有更高的识别率。截断或零填充信号以获得固定长度的信号比重新采样信号具有更高的识别率。当对每个音素类别使用单独的TDNN时，鼻音增加了9%，顿音增加了2.4%。研究还发现，网络的大小取决于识别问题的复杂程度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Neural networks used for speech recognition

Neural networks are applied to the recognition of Arabic phonemes. Time delay neural networks (TDNN) have been chosen for the problem of Arabic speech recognition because of their ability to represent relationships between acoustic events. Two Arabic categories have been chosen, nasals and voiced stops, to evaluate the performance of TDNN in Arabic speech recognition. The effect of several factors on the recognition rate have been studied: (i) the length of the analysis frame (10 ms and 20 ms); (ii) truncating or zero padding the signal versus re-sampling the signal to obtain a signal of fixed length; (iii) using a single neural network versus a separate neural network for each phoneme category; and (iv) the size of the TDNN. The length of the 20 ms analysis frame obtained higher recognition rates than the 10 ms frames. Truncating or zero padding the signal to obtain a fixed length signal gave higher recognition rates than re-sampling the signal. A 9% increase for nasals and 2.4% increase for stops, were obtained when using a separate TDNN for each phoneme category. It was also found that the size of the network depends on the complexity of the recognition problem.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Nineteenth National Radio Science Conference

自引率

0.00%

发文量