用于语音识别的神经网络

S. El-Ramly, N. Abdel-Kader, R. El-Adawi
{"title":"用于语音识别的神经网络","authors":"S. El-Ramly, N. Abdel-Kader, R. El-Adawi","doi":"10.1109/NRSC.2002.1022622","DOIUrl":null,"url":null,"abstract":"Neural networks are applied to the recognition of Arabic phonemes. Time delay neural networks (TDNN) have been chosen for the problem of Arabic speech recognition because of their ability to represent relationships between acoustic events. Two Arabic categories have been chosen, nasals and voiced stops, to evaluate the performance of TDNN in Arabic speech recognition. The effect of several factors on the recognition rate have been studied: (i) the length of the analysis frame (10 ms and 20 ms); (ii) truncating or zero padding the signal versus re-sampling the signal to obtain a signal of fixed length; (iii) using a single neural network versus a separate neural network for each phoneme category; and (iv) the size of the TDNN. The length of the 20 ms analysis frame obtained higher recognition rates than the 10 ms frames. Truncating or zero padding the signal to obtain a fixed length signal gave higher recognition rates than re-sampling the signal. A 9% increase for nasals and 2.4% increase for stops, were obtained when using a separate TDNN for each phoneme category. It was also found that the size of the network depends on the complexity of the recognition problem.","PeriodicalId":231600,"journal":{"name":"Proceedings of the Nineteenth National Radio Science Conference","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":"{\"title\":\"Neural networks used for speech recognition\",\"authors\":\"S. El-Ramly, N. Abdel-Kader, R. El-Adawi\",\"doi\":\"10.1109/NRSC.2002.1022622\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Neural networks are applied to the recognition of Arabic phonemes. Time delay neural networks (TDNN) have been chosen for the problem of Arabic speech recognition because of their ability to represent relationships between acoustic events. Two Arabic categories have been chosen, nasals and voiced stops, to evaluate the performance of TDNN in Arabic speech recognition. The effect of several factors on the recognition rate have been studied: (i) the length of the analysis frame (10 ms and 20 ms); (ii) truncating or zero padding the signal versus re-sampling the signal to obtain a signal of fixed length; (iii) using a single neural network versus a separate neural network for each phoneme category; and (iv) the size of the TDNN. The length of the 20 ms analysis frame obtained higher recognition rates than the 10 ms frames. Truncating or zero padding the signal to obtain a fixed length signal gave higher recognition rates than re-sampling the signal. A 9% increase for nasals and 2.4% increase for stops, were obtained when using a separate TDNN for each phoneme category. It was also found that the size of the network depends on the complexity of the recognition problem.\",\"PeriodicalId\":231600,\"journal\":{\"name\":\"Proceedings of the Nineteenth National Radio Science Conference\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-11-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"24\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Nineteenth National Radio Science Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NRSC.2002.1022622\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Nineteenth National Radio Science Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NRSC.2002.1022622","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 24

摘要

将神经网络应用于阿拉伯语音素的识别。延时神经网络(TDNN)由于能够表示声音事件之间的关系而被选择用于阿拉伯语语音识别问题。选择了两个阿拉伯语类别,鼻音和浊音顿音,来评估TDNN在阿拉伯语语音识别中的性能。研究了几个因素对识别率的影响:(i)分析帧的长度(10 ms和20 ms);(ii)对信号进行截断或零填充,而不是对信号重新采样以获得固定长度的信号;(iii)对每个音素类别使用单个神经网络,而不是单独的神经网络;及(iv) TDNN的大小。长度为20ms的分析帧比长度为10ms的分析帧具有更高的识别率。截断或零填充信号以获得固定长度的信号比重新采样信号具有更高的识别率。当对每个音素类别使用单独的TDNN时,鼻音增加了9%,顿音增加了2.4%。研究还发现,网络的大小取决于识别问题的复杂程度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Neural networks used for speech recognition
Neural networks are applied to the recognition of Arabic phonemes. Time delay neural networks (TDNN) have been chosen for the problem of Arabic speech recognition because of their ability to represent relationships between acoustic events. Two Arabic categories have been chosen, nasals and voiced stops, to evaluate the performance of TDNN in Arabic speech recognition. The effect of several factors on the recognition rate have been studied: (i) the length of the analysis frame (10 ms and 20 ms); (ii) truncating or zero padding the signal versus re-sampling the signal to obtain a signal of fixed length; (iii) using a single neural network versus a separate neural network for each phoneme category; and (iv) the size of the TDNN. The length of the 20 ms analysis frame obtained higher recognition rates than the 10 ms frames. Truncating or zero padding the signal to obtain a fixed length signal gave higher recognition rates than re-sampling the signal. A 9% increase for nasals and 2.4% increase for stops, were obtained when using a separate TDNN for each phoneme category. It was also found that the size of the network depends on the complexity of the recognition problem.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信