An End-to-End Language-Tracking Speech Recognizer for Mixed-Language Speech

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2018-09-10 DOI:10.1109/ICASSP.2018.8462180

Hiroshi Seki, Shinji Watanabe, Takaaki Hori, Jonathan Le Roux, J. Hershey

{"title":"An End-to-End Language-Tracking Speech Recognizer for Mixed-Language Speech","authors":"Hiroshi Seki, Shinji Watanabe, Takaaki Hori, Jonathan Le Roux, J. Hershey","doi":"10.1109/ICASSP.2018.8462180","DOIUrl":null,"url":null,"abstract":"End-to-end automatic speech recognition (ASR) can significantly reduce the burden of developing ASR systems for new languages, by eliminating the need for linguistic information such as pronunciation dictionaries. This also creates an opportunity to build a monolithic multilingual ASR system with a language-independent neural network architecture. In our previous work, we proposed a monolithic neural network architecture that can recognize multiple languages, and showed its effectiveness compared with conventional language-dependent models. However, the model is not guaranteed to properly handle switches in language within an utterance, thus lacking the flexibility to recognize mixed-language speech such as code-switching. In this paper, we extend our model to enable dynamic tracking of the language within an utterance, and propose a training procedure that takes advantage of a newly created mixed-language speech corpus. Experimental results show that the extended model outperforms both language-dependent models and our previous model without suffering from performance degradation that could be associated with language switching.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"54 1","pages":"4919-4923"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"58","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2018.8462180","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 58

Abstract

End-to-end automatic speech recognition (ASR) can significantly reduce the burden of developing ASR systems for new languages, by eliminating the need for linguistic information such as pronunciation dictionaries. This also creates an opportunity to build a monolithic multilingual ASR system with a language-independent neural network architecture. In our previous work, we proposed a monolithic neural network architecture that can recognize multiple languages, and showed its effectiveness compared with conventional language-dependent models. However, the model is not guaranteed to properly handle switches in language within an utterance, thus lacking the flexibility to recognize mixed-language speech such as code-switching. In this paper, we extend our model to enable dynamic tracking of the language within an utterance, and propose a training procedure that takes advantage of a newly created mixed-language speech corpus. Experimental results show that the extended model outperforms both language-dependent models and our previous model without suffering from performance degradation that could be associated with language switching.

查看原文本刊更多论文

混合语言语音的端到端语言跟踪语音识别器

端到端自动语音识别(ASR)可以通过消除对语音字典等语言信息的需求，大大减轻为新语言开发ASR系统的负担。这也为构建具有独立于语言的神经网络架构的单片多语言ASR系统创造了机会。在我们之前的工作中，我们提出了一个可以识别多种语言的单片神经网络架构，并与传统的语言依赖模型相比显示了它的有效性。然而，该模型不能保证正确处理话语内的语言切换，因此缺乏识别混合语言语音的灵活性，如代码切换。在本文中，我们扩展了我们的模型，使其能够动态跟踪话语中的语言，并提出了一个利用新创建的混合语言语音语料库的训练过程。实验结果表明，扩展模型的性能优于语言依赖模型和我们之前的模型，而不会受到可能与语言切换相关的性能下降的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

自引率

0.00%

发文量