{"title":"Tibetan Speech Recognition Based on Multipath Convolutional Neural Network","authors":"Zhenye Gan, Tingting Li, Hanwen Guo","doi":"10.1109/epce58798.2023.00047","DOIUrl":null,"url":null,"abstract":"In recent years, end-to-end automatic speech recognition (ASR) has been widely used. Many researchers have studied and improved convolutional neural networks (CNNs), but most of them are improved from the depth direction, ignoring some detailed features in the width direction. Therefore, this paper proposes a MCNN-CTC based method for Tibetan speech recognition. The multipath convolutional neural network (MCNN) acquires more detailed features by increasing the width of the network. In order to study the effect of the depth and width of the model on speech recognition rate, this paper studies MCNN-CTC with DCNN-CTC as the baseline model. The results show that compared with DCNN-CTC, the relative error rate of MCNN-CTC decreases by 7.72% when syllables are used as modeling units. When the word was used as the modeling unit, the relative error rate decreased by 7.79%, which verified the validity of the model.","PeriodicalId":355442,"journal":{"name":"2023 2nd Asia Conference on Electrical, Power and Computer Engineering (EPCE)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 2nd Asia Conference on Electrical, Power and Computer Engineering (EPCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/epce58798.2023.00047","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In recent years, end-to-end automatic speech recognition (ASR) has been widely used. Many researchers have studied and improved convolutional neural networks (CNNs), but most of them are improved from the depth direction, ignoring some detailed features in the width direction. Therefore, this paper proposes a MCNN-CTC based method for Tibetan speech recognition. The multipath convolutional neural network (MCNN) acquires more detailed features by increasing the width of the network. In order to study the effect of the depth and width of the model on speech recognition rate, this paper studies MCNN-CTC with DCNN-CTC as the baseline model. The results show that compared with DCNN-CTC, the relative error rate of MCNN-CTC decreases by 7.72% when syllables are used as modeling units. When the word was used as the modeling unit, the relative error rate decreased by 7.79%, which verified the validity of the model.