基于CTC多层损失的语音识别方法

Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition Pub Date : 2022-11-17 DOI:10.1145/3581807.3581864

Deyu Luo, Xianhong Chen, Mao-shen Jia, C. Bao

{"title":"基于CTC多层损失的语音识别方法","authors":"Deyu Luo, Xianhong Chen, Mao-shen Jia, C. Bao","doi":"10.1145/3581807.3581864","DOIUrl":null,"url":null,"abstract":"Due to the conditional independent assumption of a CTC model, a language model is usually added to improve its speech recognition performance. However, adding a language model will increase the complexity and computation cost. Therefore, we proposed a simple and effective speech recognition method based on CTC multilayer loss. Unlike the traditional CTC model which only optimizes the CTC loss of the last layer, in this method, the CTC multilayer loss, which guides the training of the model, is obtained by weighted summation of the CTC losses of different layers. Through optimizing the losses of different layers, the information of different layers of the CTC model can be taken into account, and the information obtained is more comprehensive, so that the model obtained has better recognition performance. With a small amount of code modification, this CTC multilayer loss method can well regulate the training of CTC and improve the performance of speech recognition. Since this method only changes the loss function of the CTC model and does not change the structure of the CTC model and its testing process, the training stage is simple and the testing stage has no extra memory cost and computation cost. We evaluated the method on Aishell-1 dataset using WeNet as the baseline, and it was able to reduce the character error rate (CER) by 7.5% and improve speech recognition performance without adding a language model.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Speech Recognition Method based on CTC Multilayer Loss\",\"authors\":\"Deyu Luo, Xianhong Chen, Mao-shen Jia, C. Bao\",\"doi\":\"10.1145/3581807.3581864\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Due to the conditional independent assumption of a CTC model, a language model is usually added to improve its speech recognition performance. However, adding a language model will increase the complexity and computation cost. Therefore, we proposed a simple and effective speech recognition method based on CTC multilayer loss. Unlike the traditional CTC model which only optimizes the CTC loss of the last layer, in this method, the CTC multilayer loss, which guides the training of the model, is obtained by weighted summation of the CTC losses of different layers. Through optimizing the losses of different layers, the information of different layers of the CTC model can be taken into account, and the information obtained is more comprehensive, so that the model obtained has better recognition performance. With a small amount of code modification, this CTC multilayer loss method can well regulate the training of CTC and improve the performance of speech recognition. Since this method only changes the loss function of the CTC model and does not change the structure of the CTC model and its testing process, the training stage is simple and the testing stage has no extra memory cost and computation cost. We evaluated the method on Aishell-1 dataset using WeNet as the baseline, and it was able to reduce the character error rate (CER) by 7.5% and improve speech recognition performance without adding a language model.\",\"PeriodicalId\":292813,\"journal\":{\"name\":\"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3581807.3581864\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3581807.3581864","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

由于CTC模型的条件独立假设，通常会加入语言模型来提高其语音识别性能。但是，添加语言模型会增加复杂性和计算成本。因此，我们提出了一种简单有效的基于CTC多层损失的语音识别方法。与传统的CTC模型只优化最后一层的CTC损失不同，该方法通过对不同层的CTC损失加权求和得到指导模型训练的CTC多层损失。通过对不同层的损失进行优化，可以考虑到CTC模型的不同层的信息，得到的信息更加全面，从而使得到的模型具有更好的识别性能。通过少量的代码修改，该CTC多层损失方法可以很好地调节CTC的训练，提高语音识别的性能。由于该方法只改变了CTC模型的损失函数，不改变CTC模型的结构及其测试过程，因此训练阶段简单，测试阶段没有额外的内存成本和计算成本。以WeNet为基准，在ahell -1数据集上对该方法进行了评估，结果表明，该方法在不添加语言模型的情况下，将字符错误率(CER)降低了7.5%，提高了语音识别性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Speech Recognition Method based on CTC Multilayer Loss

Due to the conditional independent assumption of a CTC model, a language model is usually added to improve its speech recognition performance. However, adding a language model will increase the complexity and computation cost. Therefore, we proposed a simple and effective speech recognition method based on CTC multilayer loss. Unlike the traditional CTC model which only optimizes the CTC loss of the last layer, in this method, the CTC multilayer loss, which guides the training of the model, is obtained by weighted summation of the CTC losses of different layers. Through optimizing the losses of different layers, the information of different layers of the CTC model can be taken into account, and the information obtained is more comprehensive, so that the model obtained has better recognition performance. With a small amount of code modification, this CTC multilayer loss method can well regulate the training of CTC and improve the performance of speech recognition. Since this method only changes the loss function of the CTC model and does not change the structure of the CTC model and its testing process, the training stage is simple and the testing stage has no extra memory cost and computation cost. We evaluated the method on Aishell-1 dataset using WeNet as the baseline, and it was able to reduce the character error rate (CER) by 7.5% and improve speech recognition performance without adding a language model.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition

自引率

0.00%

发文量