{"title":"Neural-network architecture for linear and nonlinear predictive hidden Markov models: application to speech recognition","authors":"L. Deng, K. Hassanein, M. Elmasry","doi":"10.1109/NNSP.1991.239500","DOIUrl":null,"url":null,"abstract":"A speech recognizer is developed using a layered neural network to implement speech-frame prediction and using a Markov chain to modulate the network's weight parameters. The authors postulate that speech recognition accuracy is closely linked to the capability of the predictive model in representing long-term temporal correlations in data. Analytical expressions are obtained for the correlation functions for various types of predictive models (linear, nonlinear, and jointly linear and nonlinear) in order to determine the faithfulness of the models to the actual speech data. The analytical results, computer simulations, and speech recognition experiments suggest that when nonlinear and linear prediction are jointly performed within the same layer of the neural network, the model is better able to capture long-term data correlations and consequently improve speech recognition performance.<<ETX>>","PeriodicalId":354832,"journal":{"name":"Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1991-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NNSP.1991.239500","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10
Abstract
A speech recognizer is developed using a layered neural network to implement speech-frame prediction and using a Markov chain to modulate the network's weight parameters. The authors postulate that speech recognition accuracy is closely linked to the capability of the predictive model in representing long-term temporal correlations in data. Analytical expressions are obtained for the correlation functions for various types of predictive models (linear, nonlinear, and jointly linear and nonlinear) in order to determine the faithfulness of the models to the actual speech data. The analytical results, computer simulations, and speech recognition experiments suggest that when nonlinear and linear prediction are jointly performed within the same layer of the neural network, the model is better able to capture long-term data correlations and consequently improve speech recognition performance.<>