基于注意的端到端代码切换语音识别的上下文相关标签平滑正则化

2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2021-01-24 DOI:10.1109/ISCSLP49672.2021.9362080

Zheying Huang, Peng Li, Ji Xu, Pengyuan Zhang, Yonghong Yan

{"title":"基于注意的端到端代码切换语音识别的上下文相关标签平滑正则化","authors":"Zheying Huang, Peng Li, Ji Xu, Pengyuan Zhang, Yonghong Yan","doi":"10.1109/ISCSLP49672.2021.9362080","DOIUrl":null,"url":null,"abstract":"Previous works utilize the context-independent (CI) label smoothing regularization (LSR) method to prevent attention-based End-to-End (E2E) automatic speech recognition (ASR) model, which is trained with a cross entropy loss function and hard labels, from making over-confident predictions. But the CI LSR method does not make use of linguistic knowledge within and between languages in the case of code-switching speech recognition (CSSR). In this paper, we propose the context-dependent (CD) LSR method. According to code-switching linguistic knowledge, the output units are classified into several categories and several context dependency rules are made. Under the guidance of the context dependency rules, prior label distribution is generated dynamically according to the category of historical context, rather than being fixed. Thus, the CD LSR method can utilize the linguistic knowledge in the case of CSSR to further improve the performance of the model. Experiments on the SEAME corpus demonstrate the effects of the proposed method. The final system with the CD LSR method achieves the best performance with 37.21% mixed error rate (MER), obtaining up to 3.7% relative MER reduction compared to the baseline system with no LSR method.","PeriodicalId":279828,"journal":{"name":"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Context-dependent Label Smoothing Regularization for Attention-based End-to-End Code-Switching Speech Recognition\",\"authors\":\"Zheying Huang, Peng Li, Ji Xu, Pengyuan Zhang, Yonghong Yan\",\"doi\":\"10.1109/ISCSLP49672.2021.9362080\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Previous works utilize the context-independent (CI) label smoothing regularization (LSR) method to prevent attention-based End-to-End (E2E) automatic speech recognition (ASR) model, which is trained with a cross entropy loss function and hard labels, from making over-confident predictions. But the CI LSR method does not make use of linguistic knowledge within and between languages in the case of code-switching speech recognition (CSSR). In this paper, we propose the context-dependent (CD) LSR method. According to code-switching linguistic knowledge, the output units are classified into several categories and several context dependency rules are made. Under the guidance of the context dependency rules, prior label distribution is generated dynamically according to the category of historical context, rather than being fixed. Thus, the CD LSR method can utilize the linguistic knowledge in the case of CSSR to further improve the performance of the model. Experiments on the SEAME corpus demonstrate the effects of the proposed method. The final system with the CD LSR method achieves the best performance with 37.21% mixed error rate (MER), obtaining up to 3.7% relative MER reduction compared to the baseline system with no LSR method.\",\"PeriodicalId\":279828,\"journal\":{\"name\":\"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCSLP49672.2021.9362080\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCSLP49672.2021.9362080","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

先前的研究利用上下文无关(CI)标签平滑正则化(LSR)方法来防止基于注意力的端到端(E2E)自动语音识别(ASR)模型做出过于自信的预测，该模型使用交叉熵损失函数和硬标签进行训练。但是在语码转换语音识别中，CI LSR方法没有利用语言内部和语言之间的语言知识。在本文中，我们提出了上下文相关(CD) LSR方法。根据语码转换的语言知识，将输出单元划分为若干类，并制定了若干上下文依赖规则。在上下文依赖规则的指导下，根据历史上下文的类别动态生成先验标签分布，而不是固定的。因此，CD LSR方法可以利用CSSR情况下的语言知识，进一步提高模型的性能。在SEAME语料库上的实验验证了该方法的有效性。使用CD LSR方法的最终系统获得了最佳性能，混合错误率(MER)为37.21%，与未使用LSR方法的基准系统相比，相对错误率降低了3.7%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Context-dependent Label Smoothing Regularization for Attention-based End-to-End Code-Switching Speech Recognition

Previous works utilize the context-independent (CI) label smoothing regularization (LSR) method to prevent attention-based End-to-End (E2E) automatic speech recognition (ASR) model, which is trained with a cross entropy loss function and hard labels, from making over-confident predictions. But the CI LSR method does not make use of linguistic knowledge within and between languages in the case of code-switching speech recognition (CSSR). In this paper, we propose the context-dependent (CD) LSR method. According to code-switching linguistic knowledge, the output units are classified into several categories and several context dependency rules are made. Under the guidance of the context dependency rules, prior label distribution is generated dynamically according to the category of historical context, rather than being fixed. Thus, the CD LSR method can utilize the linguistic knowledge in the case of CSSR to further improve the performance of the model. Experiments on the SEAME corpus demonstrate the effects of the proposed method. The final system with the CD LSR method achieves the best performance with 37.21% mixed error rate (MER), obtaining up to 3.7% relative MER reduction compared to the baseline system with no LSR method.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)

自引率

0.00%

发文量