{"title":"A Survey of Speech Recognition Based on Deep Learning","authors":"Youyao Liu, Jiale Chen, Jialei Gao, Shihao Gai","doi":"10.1109/icnlp58431.2023.00034","DOIUrl":null,"url":null,"abstract":"Artificial intelligence is the vane leading the world’s scientific and technological development and future lifestyle change in the 21st century, and speech recognition, as one of the indispensable technical means, is inevitably the focus of human attention. There are two problems in traditional speech recognition: first, speech recognition technology cannot be significantly improved, and second, speech recognition systems cannot accurately extract data and features. In order to solve these problems, this paper first compares the traditional speech recognition GMM-HMM model and establishes a DNN-HMM model, which proposes a method to improve the speed of speech recognition and greatly improves the recognition rate. However, DNN-HMM lacks the ability to use historical information to assist in the current task, and a second model is proposed on the basis of this problem, that is, the LSTM model is used to solve the problem of insufficient contextual information, which further improves the speech recognition ability. Then, in order to solve the problem of long memory loss and speed up training, the Transformer model is cited, and in order to solve the problem that the traditional language model can only predict the next word in one direction, the BERT model, which has a bidirectional language model, is invoked.","PeriodicalId":53637,"journal":{"name":"Icon","volume":"482 1","pages":"151-156"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Icon","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/icnlp58431.2023.00034","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Arts and Humanities","Score":null,"Total":0}
引用次数: 0
Abstract
Artificial intelligence is the vane leading the world’s scientific and technological development and future lifestyle change in the 21st century, and speech recognition, as one of the indispensable technical means, is inevitably the focus of human attention. There are two problems in traditional speech recognition: first, speech recognition technology cannot be significantly improved, and second, speech recognition systems cannot accurately extract data and features. In order to solve these problems, this paper first compares the traditional speech recognition GMM-HMM model and establishes a DNN-HMM model, which proposes a method to improve the speed of speech recognition and greatly improves the recognition rate. However, DNN-HMM lacks the ability to use historical information to assist in the current task, and a second model is proposed on the basis of this problem, that is, the LSTM model is used to solve the problem of insufficient contextual information, which further improves the speech recognition ability. Then, in order to solve the problem of long memory loss and speed up training, the Transformer model is cited, and in order to solve the problem that the traditional language model can only predict the next word in one direction, the BERT model, which has a bidirectional language model, is invoked.