{"title":"End-to-end Speech Recognition Based on BGRU-CTC","authors":"Yu Yan, Xizhong Shen","doi":"10.1145/3502814.3502822","DOIUrl":null,"url":null,"abstract":"In recent years, the end-to-end speech recognition model has gradually become the development trend of large-scale continuous speech recognition because of its simplicity and easy training characteristics. In this paper, we use the good performance of bidirectional gated recurrent unit (BGRU), a variant of long short term memory (LSTM), in the field of speech recognition. At the same time, we use connectionist temporal classification (CTC) algorithm to train the model, build an end-to-end speech recognition system, and carry out speech recognition experiments on TIMIT. The results show that, compared with the traditional recognition model, the accuracy of the improved end-to-end model is improved by about 2.4%.","PeriodicalId":115172,"journal":{"name":"Proceedings of the 2021 4th International Conference on Sensors, Signal and Image Processing","volume":"243 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 4th International Conference on Sensors, Signal and Image Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3502814.3502822","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In recent years, the end-to-end speech recognition model has gradually become the development trend of large-scale continuous speech recognition because of its simplicity and easy training characteristics. In this paper, we use the good performance of bidirectional gated recurrent unit (BGRU), a variant of long short term memory (LSTM), in the field of speech recognition. At the same time, we use connectionist temporal classification (CTC) algorithm to train the model, build an end-to-end speech recognition system, and carry out speech recognition experiments on TIMIT. The results show that, compared with the traditional recognition model, the accuracy of the improved end-to-end model is improved by about 2.4%.