基于多任务学习的端到端维吾尔语语音识别

Chuang Liu, Qiong Li, Zhiwei You
{"title":"基于多任务学习的端到端维吾尔语语音识别","authors":"Chuang Liu, Qiong Li, Zhiwei You","doi":"10.1109/ICARCE55724.2022.10046434","DOIUrl":null,"url":null,"abstract":"With the emergence of deep learning, speech recognition research for languages with a wide speaker base, such as English and Chinese, has become reasonably mature. However, Uyghur speech recognition research has only developed slowly, because the modeling is subpar since Uighur is a low-resource language. To solve the aforementioned problem, we propose an end-to-end Uyghur speech recognition model based on multi-task learning with Branchformer. Branchformer can capture both global and local contexts, allowing it to learn richer features from low volumes of data to improve model performance in resource-constrained situations. The multi-task learning model can fully utilize the data through multiple related tasks, enhancing the model’s generalizability under low resources. In this paper, the multi-task learning model is trained and decoded using modeling units including phoneme, sub-word, and word. The performance of the model is then evaluated using various datasets. The results show that the proposed model outperforms the mainstream models in terms of recognition, in the self-built database and open-source corpus Thuyg-20, respectively, the word accuracy of the proposed model reaches 94.48% and 91.16%, which is a significant improvement compared with each benchmark model.","PeriodicalId":416305,"journal":{"name":"2022 International Conference on Automation, Robotics and Computer Engineering (ICARCE)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"An End-to-End Uyghur Speech Recognition Based on Multi-task Learning\",\"authors\":\"Chuang Liu, Qiong Li, Zhiwei You\",\"doi\":\"10.1109/ICARCE55724.2022.10046434\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the emergence of deep learning, speech recognition research for languages with a wide speaker base, such as English and Chinese, has become reasonably mature. However, Uyghur speech recognition research has only developed slowly, because the modeling is subpar since Uighur is a low-resource language. To solve the aforementioned problem, we propose an end-to-end Uyghur speech recognition model based on multi-task learning with Branchformer. Branchformer can capture both global and local contexts, allowing it to learn richer features from low volumes of data to improve model performance in resource-constrained situations. The multi-task learning model can fully utilize the data through multiple related tasks, enhancing the model’s generalizability under low resources. In this paper, the multi-task learning model is trained and decoded using modeling units including phoneme, sub-word, and word. The performance of the model is then evaluated using various datasets. The results show that the proposed model outperforms the mainstream models in terms of recognition, in the self-built database and open-source corpus Thuyg-20, respectively, the word accuracy of the proposed model reaches 94.48% and 91.16%, which is a significant improvement compared with each benchmark model.\",\"PeriodicalId\":416305,\"journal\":{\"name\":\"2022 International Conference on Automation, Robotics and Computer Engineering (ICARCE)\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Automation, Robotics and Computer Engineering (ICARCE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICARCE55724.2022.10046434\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Automation, Robotics and Computer Engineering (ICARCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICARCE55724.2022.10046434","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

随着深度学习的出现,针对英语和汉语等使用人群广泛的语言的语音识别研究已经相当成熟。然而,由于维吾尔语是一种低资源语言,其建模水平不高,因此维吾尔语语音识别研究进展缓慢。为了解决上述问题,我们提出了一种基于Branchformer多任务学习的端到端维吾尔语语音识别模型。Branchformer可以捕获全局和局部上下文,允许它从少量数据中学习更丰富的特征,以提高资源受限情况下的模型性能。多任务学习模型可以通过多个相关任务充分利用数据,增强了模型在低资源条件下的泛化能力。本文采用音素、子词、词等建模单元对多任务学习模型进行训练和解码。然后使用各种数据集评估模型的性能。结果表明,所提模型在识别方面优于主流模型,在自建数据库和开源语料库Thuyg-20中,所提模型的词正确率分别达到94.48%和91.16%,与各基准模型相比均有显著提高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An End-to-End Uyghur Speech Recognition Based on Multi-task Learning
With the emergence of deep learning, speech recognition research for languages with a wide speaker base, such as English and Chinese, has become reasonably mature. However, Uyghur speech recognition research has only developed slowly, because the modeling is subpar since Uighur is a low-resource language. To solve the aforementioned problem, we propose an end-to-end Uyghur speech recognition model based on multi-task learning with Branchformer. Branchformer can capture both global and local contexts, allowing it to learn richer features from low volumes of data to improve model performance in resource-constrained situations. The multi-task learning model can fully utilize the data through multiple related tasks, enhancing the model’s generalizability under low resources. In this paper, the multi-task learning model is trained and decoded using modeling units including phoneme, sub-word, and word. The performance of the model is then evaluated using various datasets. The results show that the proposed model outperforms the mainstream models in terms of recognition, in the self-built database and open-source corpus Thuyg-20, respectively, the word accuracy of the proposed model reaches 94.48% and 91.16%, which is a significant improvement compared with each benchmark model.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信