基于混合建模单元的端到端语音识别系统改进研究

Shunfei Chen, Xinhui Hu, Sheng Li, Xinkang Xu
{"title":"基于混合建模单元的端到端语音识别系统改进研究","authors":"Shunfei Chen, Xinhui Hu, Sheng Li, Xinkang Xu","doi":"10.1109/ICASSP39728.2021.9414598","DOIUrl":null,"url":null,"abstract":"The acoustic modeling unit is crucial for an end-to-end speech recognition system, especially for the Mandarin language. Until now, most of the studies on Mandarin speech recognition focused on individual units, and few of them paid attention to using a combination of these units. This paper uses a hybrid of the syllable, Chinese character, and subword as the modeling units for the end-to-end speech recognition system based on the CTC/attention multi-task learning. In this approach, the character-subword unit is assigned to train the transformer model in the main task learning stage. In contrast, the syllable unit is assigned to enhance the transformer’s shared encoder in the auxiliary task stage with the Connectionist Temporal Classification (CTC) loss function. The recognition experiments were conducted on AISHELL-1 and an open data set of 1200-hour Mandarin speech corpus collected from the OpenSLR, respectively. The experimental results demonstrated that using the syllable-char-subword hybrid modeling unit can achieve better performances than the conventional units of char-subword, and 6.6% relative CER reduction on our 1200-hour data. The substitution error also achieves a considerable reduction.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"An Investigation of Using Hybrid Modeling Units for Improving End-to-End Speech Recognition System\",\"authors\":\"Shunfei Chen, Xinhui Hu, Sheng Li, Xinkang Xu\",\"doi\":\"10.1109/ICASSP39728.2021.9414598\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The acoustic modeling unit is crucial for an end-to-end speech recognition system, especially for the Mandarin language. Until now, most of the studies on Mandarin speech recognition focused on individual units, and few of them paid attention to using a combination of these units. This paper uses a hybrid of the syllable, Chinese character, and subword as the modeling units for the end-to-end speech recognition system based on the CTC/attention multi-task learning. In this approach, the character-subword unit is assigned to train the transformer model in the main task learning stage. In contrast, the syllable unit is assigned to enhance the transformer’s shared encoder in the auxiliary task stage with the Connectionist Temporal Classification (CTC) loss function. The recognition experiments were conducted on AISHELL-1 and an open data set of 1200-hour Mandarin speech corpus collected from the OpenSLR, respectively. The experimental results demonstrated that using the syllable-char-subword hybrid modeling unit can achieve better performances than the conventional units of char-subword, and 6.6% relative CER reduction on our 1200-hour data. The substitution error also achieves a considerable reduction.\",\"PeriodicalId\":347060,\"journal\":{\"name\":\"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP39728.2021.9414598\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP39728.2021.9414598","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

摘要

声学建模单元对于端到端语音识别系统至关重要,特别是对于普通话。到目前为止,对普通话语音识别的研究大多集中在单个单位上,很少有研究关注这些单位的组合使用。本文采用音节、汉字和子词的混合模型作为基于CTC/注意多任务学习的端到端语音识别系统的建模单元。在该方法中,在主任务学习阶段,分配字符-子词单元来训练变压器模型。在辅助任务阶段,使用连接时间分类(Connectionist Temporal Classification, CTC)损失函数分配音节单位来增强变压器的共享编码器。识别实验分别在AISHELL-1和OpenSLR中收集的1200小时普通话语音语料库开放数据集上进行。实验结果表明,使用音节-字符-子词混合建模单元可以获得比传统的字符-子词混合建模单元更好的性能,在1200小时的数据上,相对CER降低了6.6%。替换误差也得到了相当大的减小。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An Investigation of Using Hybrid Modeling Units for Improving End-to-End Speech Recognition System
The acoustic modeling unit is crucial for an end-to-end speech recognition system, especially for the Mandarin language. Until now, most of the studies on Mandarin speech recognition focused on individual units, and few of them paid attention to using a combination of these units. This paper uses a hybrid of the syllable, Chinese character, and subword as the modeling units for the end-to-end speech recognition system based on the CTC/attention multi-task learning. In this approach, the character-subword unit is assigned to train the transformer model in the main task learning stage. In contrast, the syllable unit is assigned to enhance the transformer’s shared encoder in the auxiliary task stage with the Connectionist Temporal Classification (CTC) loss function. The recognition experiments were conducted on AISHELL-1 and an open data set of 1200-hour Mandarin speech corpus collected from the OpenSLR, respectively. The experimental results demonstrated that using the syllable-char-subword hybrid modeling unit can achieve better performances than the conventional units of char-subword, and 6.6% relative CER reduction on our 1200-hour data. The substitution error also achieves a considerable reduction.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信