基于多径卷积神经网络的藏文语音识别

Zhenye Gan, Tingting Li, Hanwen Guo
{"title":"基于多径卷积神经网络的藏文语音识别","authors":"Zhenye Gan, Tingting Li, Hanwen Guo","doi":"10.1109/epce58798.2023.00047","DOIUrl":null,"url":null,"abstract":"In recent years, end-to-end automatic speech recognition (ASR) has been widely used. Many researchers have studied and improved convolutional neural networks (CNNs), but most of them are improved from the depth direction, ignoring some detailed features in the width direction. Therefore, this paper proposes a MCNN-CTC based method for Tibetan speech recognition. The multipath convolutional neural network (MCNN) acquires more detailed features by increasing the width of the network. In order to study the effect of the depth and width of the model on speech recognition rate, this paper studies MCNN-CTC with DCNN-CTC as the baseline model. The results show that compared with DCNN-CTC, the relative error rate of MCNN-CTC decreases by 7.72% when syllables are used as modeling units. When the word was used as the modeling unit, the relative error rate decreased by 7.79%, which verified the validity of the model.","PeriodicalId":355442,"journal":{"name":"2023 2nd Asia Conference on Electrical, Power and Computer Engineering (EPCE)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Tibetan Speech Recognition Based on Multipath Convolutional Neural Network\",\"authors\":\"Zhenye Gan, Tingting Li, Hanwen Guo\",\"doi\":\"10.1109/epce58798.2023.00047\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, end-to-end automatic speech recognition (ASR) has been widely used. Many researchers have studied and improved convolutional neural networks (CNNs), but most of them are improved from the depth direction, ignoring some detailed features in the width direction. Therefore, this paper proposes a MCNN-CTC based method for Tibetan speech recognition. The multipath convolutional neural network (MCNN) acquires more detailed features by increasing the width of the network. In order to study the effect of the depth and width of the model on speech recognition rate, this paper studies MCNN-CTC with DCNN-CTC as the baseline model. The results show that compared with DCNN-CTC, the relative error rate of MCNN-CTC decreases by 7.72% when syllables are used as modeling units. When the word was used as the modeling unit, the relative error rate decreased by 7.79%, which verified the validity of the model.\",\"PeriodicalId\":355442,\"journal\":{\"name\":\"2023 2nd Asia Conference on Electrical, Power and Computer Engineering (EPCE)\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 2nd Asia Conference on Electrical, Power and Computer Engineering (EPCE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/epce58798.2023.00047\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 2nd Asia Conference on Electrical, Power and Computer Engineering (EPCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/epce58798.2023.00047","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

近年来,端到端自动语音识别(ASR)得到了广泛的应用。许多研究者对卷积神经网络(cnn)进行了研究和改进,但大多是从深度方向进行改进,忽略了宽度方向上的一些细节特征。为此,本文提出了一种基于MCNN-CTC的藏文语音识别方法。多路径卷积神经网络(MCNN)通过增加网络宽度来获取更详细的特征。为了研究模型的深度和宽度对语音识别率的影响,本文以DCNN-CTC为基线模型研究MCNN-CTC。结果表明,与DCNN-CTC相比,以音节为建模单元时,MCNN-CTC的相对错误率降低了7.72%。以单词为建模单元时,相对错误率降低了7.79%,验证了模型的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Tibetan Speech Recognition Based on Multipath Convolutional Neural Network
In recent years, end-to-end automatic speech recognition (ASR) has been widely used. Many researchers have studied and improved convolutional neural networks (CNNs), but most of them are improved from the depth direction, ignoring some detailed features in the width direction. Therefore, this paper proposes a MCNN-CTC based method for Tibetan speech recognition. The multipath convolutional neural network (MCNN) acquires more detailed features by increasing the width of the network. In order to study the effect of the depth and width of the model on speech recognition rate, this paper studies MCNN-CTC with DCNN-CTC as the baseline model. The results show that compared with DCNN-CTC, the relative error rate of MCNN-CTC decreases by 7.72% when syllables are used as modeling units. When the word was used as the modeling unit, the relative error rate decreased by 7.79%, which verified the validity of the model.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信