利用循环gan和域间损失改进半监督端到端自动语音识别

2022 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2022-10-20 DOI:10.1109/SLT54892.2023.10022448

C. Li, Ngoc Thang Vu

{"title":"利用循环gan和域间损失改进半监督端到端自动语音识别","authors":"C. Li, Ngoc Thang Vu","doi":"10.1109/SLT54892.2023.10022448","DOIUrl":null,"url":null,"abstract":"We propose a novel method that combines CycleGAN and inter-domain losses for semi-supervised end-to-end automatic speech recognition. Inter-domain loss targets the extraction of an intermediate shared representation of speech and text inputs using a shared network. CycleGAN uses cycle-consistent loss and the identity mapping loss to preserve relevant characteristics of the input feature after converting from one domain to another. As such, both approaches are suitable to train end-to-end models on unpaired speech-text inputs. In this paper, we exploit the advantages from both inter-domain loss and CycleGAN to achieve better shared representation of unpaired speech and text inputs and thus improve the speech-to-text mapping. Our experimental results on the WSJ eval92 and Voxforge (non English) show $8\\sim 8.5\\%$ character error rate reduction over the baseline, and the results on LibriSpeech test_clean also show noticeable improvement.","PeriodicalId":352002,"journal":{"name":"2022 IEEE Spoken Language Technology Workshop (SLT)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Improving Semi-Supervised End-To-End Automatic Speech Recognition Using Cyclegan and Inter-Domain Losses\",\"authors\":\"C. Li, Ngoc Thang Vu\",\"doi\":\"10.1109/SLT54892.2023.10022448\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose a novel method that combines CycleGAN and inter-domain losses for semi-supervised end-to-end automatic speech recognition. Inter-domain loss targets the extraction of an intermediate shared representation of speech and text inputs using a shared network. CycleGAN uses cycle-consistent loss and the identity mapping loss to preserve relevant characteristics of the input feature after converting from one domain to another. As such, both approaches are suitable to train end-to-end models on unpaired speech-text inputs. In this paper, we exploit the advantages from both inter-domain loss and CycleGAN to achieve better shared representation of unpaired speech and text inputs and thus improve the speech-to-text mapping. Our experimental results on the WSJ eval92 and Voxforge (non English) show $8\\\\sim 8.5\\\\%$ character error rate reduction over the baseline, and the results on LibriSpeech test_clean also show noticeable improvement.\",\"PeriodicalId\":352002,\"journal\":{\"name\":\"2022 IEEE Spoken Language Technology Workshop (SLT)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE Spoken Language Technology Workshop (SLT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SLT54892.2023.10022448\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT54892.2023.10022448","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

我们提出了一种结合CycleGAN和域间损失的半监督端到端自动语音识别方法。域间损失的目标是使用共享网络提取语音和文本输入的中间共享表示。CycleGAN在从一个域转换到另一个域后，使用周期一致损失和恒等映射损失来保持输入特征的相关特征。因此，这两种方法都适用于在未配对的语音文本输入上训练端到端模型。在本文中，我们利用域间损失和CycleGAN的优势来实现未配对语音和文本输入的更好的共享表示，从而改进语音到文本的映射。我们在WSJ eval92和Voxforge(非英语)上的实验结果显示，与基线相比，字符错误率降低了8.5 %，而在LibriSpeech test_clean上的结果也显示出明显的改善。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Improving Semi-Supervised End-To-End Automatic Speech Recognition Using Cyclegan and Inter-Domain Losses

We propose a novel method that combines CycleGAN and inter-domain losses for semi-supervised end-to-end automatic speech recognition. Inter-domain loss targets the extraction of an intermediate shared representation of speech and text inputs using a shared network. CycleGAN uses cycle-consistent loss and the identity mapping loss to preserve relevant characteristics of the input feature after converting from one domain to another. As such, both approaches are suitable to train end-to-end models on unpaired speech-text inputs. In this paper, we exploit the advantages from both inter-domain loss and CycleGAN to achieve better shared representation of unpaired speech and text inputs and thus improve the speech-to-text mapping. Our experimental results on the WSJ eval92 and Voxforge (non English) show $8\sim 8.5\%$ character error rate reduction over the baseline, and the results on LibriSpeech test_clean also show noticeable improvement.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE Spoken Language Technology Workshop (SLT)

自引率

0.00%

发文量