{"title":"用于语音识别的神经网络","authors":"S. El-Ramly, N. Abdel-Kader, R. El-Adawi","doi":"10.1109/NRSC.2002.1022622","DOIUrl":null,"url":null,"abstract":"Neural networks are applied to the recognition of Arabic phonemes. Time delay neural networks (TDNN) have been chosen for the problem of Arabic speech recognition because of their ability to represent relationships between acoustic events. Two Arabic categories have been chosen, nasals and voiced stops, to evaluate the performance of TDNN in Arabic speech recognition. The effect of several factors on the recognition rate have been studied: (i) the length of the analysis frame (10 ms and 20 ms); (ii) truncating or zero padding the signal versus re-sampling the signal to obtain a signal of fixed length; (iii) using a single neural network versus a separate neural network for each phoneme category; and (iv) the size of the TDNN. The length of the 20 ms analysis frame obtained higher recognition rates than the 10 ms frames. Truncating or zero padding the signal to obtain a fixed length signal gave higher recognition rates than re-sampling the signal. A 9% increase for nasals and 2.4% increase for stops, were obtained when using a separate TDNN for each phoneme category. It was also found that the size of the network depends on the complexity of the recognition problem.","PeriodicalId":231600,"journal":{"name":"Proceedings of the Nineteenth National Radio Science Conference","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":"{\"title\":\"Neural networks used for speech recognition\",\"authors\":\"S. El-Ramly, N. Abdel-Kader, R. El-Adawi\",\"doi\":\"10.1109/NRSC.2002.1022622\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Neural networks are applied to the recognition of Arabic phonemes. Time delay neural networks (TDNN) have been chosen for the problem of Arabic speech recognition because of their ability to represent relationships between acoustic events. Two Arabic categories have been chosen, nasals and voiced stops, to evaluate the performance of TDNN in Arabic speech recognition. The effect of several factors on the recognition rate have been studied: (i) the length of the analysis frame (10 ms and 20 ms); (ii) truncating or zero padding the signal versus re-sampling the signal to obtain a signal of fixed length; (iii) using a single neural network versus a separate neural network for each phoneme category; and (iv) the size of the TDNN. The length of the 20 ms analysis frame obtained higher recognition rates than the 10 ms frames. Truncating or zero padding the signal to obtain a fixed length signal gave higher recognition rates than re-sampling the signal. A 9% increase for nasals and 2.4% increase for stops, were obtained when using a separate TDNN for each phoneme category. It was also found that the size of the network depends on the complexity of the recognition problem.\",\"PeriodicalId\":231600,\"journal\":{\"name\":\"Proceedings of the Nineteenth National Radio Science Conference\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-11-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"24\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Nineteenth National Radio Science Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NRSC.2002.1022622\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Nineteenth National Radio Science Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NRSC.2002.1022622","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Neural networks are applied to the recognition of Arabic phonemes. Time delay neural networks (TDNN) have been chosen for the problem of Arabic speech recognition because of their ability to represent relationships between acoustic events. Two Arabic categories have been chosen, nasals and voiced stops, to evaluate the performance of TDNN in Arabic speech recognition. The effect of several factors on the recognition rate have been studied: (i) the length of the analysis frame (10 ms and 20 ms); (ii) truncating or zero padding the signal versus re-sampling the signal to obtain a signal of fixed length; (iii) using a single neural network versus a separate neural network for each phoneme category; and (iv) the size of the TDNN. The length of the 20 ms analysis frame obtained higher recognition rates than the 10 ms frames. Truncating or zero padding the signal to obtain a fixed length signal gave higher recognition rates than re-sampling the signal. A 9% increase for nasals and 2.4% increase for stops, were obtained when using a separate TDNN for each phoneme category. It was also found that the size of the network depends on the complexity of the recognition problem.