基于多径卷积神经网络的藏文语音识别

2023 2nd Asia Conference on Electrical, Power and Computer Engineering (EPCE) Pub Date : 2023-04-01 DOI:10.1109/epce58798.2023.00047

Zhenye Gan, Tingting Li, Hanwen Guo

{"title":"基于多径卷积神经网络的藏文语音识别","authors":"Zhenye Gan, Tingting Li, Hanwen Guo","doi":"10.1109/epce58798.2023.00047","DOIUrl":null,"url":null,"abstract":"In recent years, end-to-end automatic speech recognition (ASR) has been widely used. Many researchers have studied and improved convolutional neural networks (CNNs), but most of them are improved from the depth direction, ignoring some detailed features in the width direction. Therefore, this paper proposes a MCNN-CTC based method for Tibetan speech recognition. The multipath convolutional neural network (MCNN) acquires more detailed features by increasing the width of the network. In order to study the effect of the depth and width of the model on speech recognition rate, this paper studies MCNN-CTC with DCNN-CTC as the baseline model. The results show that compared with DCNN-CTC, the relative error rate of MCNN-CTC decreases by 7.72% when syllables are used as modeling units. When the word was used as the modeling unit, the relative error rate decreased by 7.79%, which verified the validity of the model.","PeriodicalId":355442,"journal":{"name":"2023 2nd Asia Conference on Electrical, Power and Computer Engineering (EPCE)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Tibetan Speech Recognition Based on Multipath Convolutional Neural Network\",\"authors\":\"Zhenye Gan, Tingting Li, Hanwen Guo\",\"doi\":\"10.1109/epce58798.2023.00047\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, end-to-end automatic speech recognition (ASR) has been widely used. Many researchers have studied and improved convolutional neural networks (CNNs), but most of them are improved from the depth direction, ignoring some detailed features in the width direction. Therefore, this paper proposes a MCNN-CTC based method for Tibetan speech recognition. The multipath convolutional neural network (MCNN) acquires more detailed features by increasing the width of the network. In order to study the effect of the depth and width of the model on speech recognition rate, this paper studies MCNN-CTC with DCNN-CTC as the baseline model. The results show that compared with DCNN-CTC, the relative error rate of MCNN-CTC decreases by 7.72% when syllables are used as modeling units. When the word was used as the modeling unit, the relative error rate decreased by 7.79%, which verified the validity of the model.\",\"PeriodicalId\":355442,\"journal\":{\"name\":\"2023 2nd Asia Conference on Electrical, Power and Computer Engineering (EPCE)\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 2nd Asia Conference on Electrical, Power and Computer Engineering (EPCE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/epce58798.2023.00047\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 2nd Asia Conference on Electrical, Power and Computer Engineering (EPCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/epce58798.2023.00047","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

近年来，端到端自动语音识别(ASR)得到了广泛的应用。许多研究者对卷积神经网络(cnn)进行了研究和改进，但大多是从深度方向进行改进，忽略了宽度方向上的一些细节特征。为此，本文提出了一种基于MCNN-CTC的藏文语音识别方法。多路径卷积神经网络(MCNN)通过增加网络宽度来获取更详细的特征。为了研究模型的深度和宽度对语音识别率的影响，本文以DCNN-CTC为基线模型研究MCNN-CTC。结果表明，与DCNN-CTC相比，以音节为建模单元时，MCNN-CTC的相对错误率降低了7.72%。以单词为建模单元时，相对错误率降低了7.79%，验证了模型的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Tibetan Speech Recognition Based on Multipath Convolutional Neural Network

In recent years, end-to-end automatic speech recognition (ASR) has been widely used. Many researchers have studied and improved convolutional neural networks (CNNs), but most of them are improved from the depth direction, ignoring some detailed features in the width direction. Therefore, this paper proposes a MCNN-CTC based method for Tibetan speech recognition. The multipath convolutional neural network (MCNN) acquires more detailed features by increasing the width of the network. In order to study the effect of the depth and width of the model on speech recognition rate, this paper studies MCNN-CTC with DCNN-CTC as the baseline model. The results show that compared with DCNN-CTC, the relative error rate of MCNN-CTC decreases by 7.72% when syllables are used as modeling units. When the word was used as the modeling unit, the relative error rate decreased by 7.79%, which verified the validity of the model.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 2nd Asia Conference on Electrical, Power and Computer Engineering (EPCE)

自引率

0.00%

发文量