{"title":"Recognition of English speech – using a deep learning algorithm","authors":"Shuyan Wang","doi":"10.1515/jisys-2022-0236","DOIUrl":null,"url":null,"abstract":"Abstract The accurate recognition of speech is beneficial to the fields of machine translation and intelligent human–computer interaction. After briefly introducing speech recognition algorithms, this study proposed to recognize speech with a recurrent neural network (RNN) and adopted the connectionist temporal classification (CTC) algorithm to align input speech sequences and output text sequences forcibly. Simulation experiments compared the RNN-CTC algorithm with the Gaussian mixture model–hidden Markov model and convolutional neural network-CTC algorithms. The results demonstrated that the more training samples the speech recognition algorithm had, the higher the recognition accuracy of the trained algorithm was, but the training time consumption increased gradually; the more samples a trained speech recognition algorithm had to test, the lower the recognition accuracy and the longer the testing time. The proposed RNN-CTC speech recognition algorithm always had the highest accuracy and the lowest training and testing time among the three algorithms when the number of training and testing samples was the same.","PeriodicalId":46139,"journal":{"name":"Journal of Intelligent Systems","volume":"8 1","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Intelligent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1515/jisys-2022-0236","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 1
Abstract
Abstract The accurate recognition of speech is beneficial to the fields of machine translation and intelligent human–computer interaction. After briefly introducing speech recognition algorithms, this study proposed to recognize speech with a recurrent neural network (RNN) and adopted the connectionist temporal classification (CTC) algorithm to align input speech sequences and output text sequences forcibly. Simulation experiments compared the RNN-CTC algorithm with the Gaussian mixture model–hidden Markov model and convolutional neural network-CTC algorithms. The results demonstrated that the more training samples the speech recognition algorithm had, the higher the recognition accuracy of the trained algorithm was, but the training time consumption increased gradually; the more samples a trained speech recognition algorithm had to test, the lower the recognition accuracy and the longer the testing time. The proposed RNN-CTC speech recognition algorithm always had the highest accuracy and the lowest training and testing time among the three algorithms when the number of training and testing samples was the same.
期刊介绍:
The Journal of Intelligent Systems aims to provide research and review papers, as well as Brief Communications at an interdisciplinary level, with the field of intelligent systems providing the focal point. This field includes areas like artificial intelligence, models and computational theories of human cognition, perception and motivation; brain models, artificial neural nets and neural computing. It covers contributions from the social, human and computer sciences to the analysis and application of information technology.