会话语音识别中使用神经机器翻译的语言模型自引导

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2019-12-01 DOI:10.1109/ASRU46091.2019.9003982

Surabhi Punjabi, Harish Arsikere, S. Garimella

{"title":"会话语音识别中使用神经机器翻译的语言模型自引导","authors":"Surabhi Punjabi, Harish Arsikere, S. Garimella","doi":"10.1109/ASRU46091.2019.9003982","DOIUrl":null,"url":null,"abstract":"Building conversational speech recognition systems for new languages is constrained by the availability of utterances capturing user-device interactions. Data collection is expensive and limited by speed of manual transcription. In order to address this, we advocate the use of neural machine translation as a data augmentation technique for bootstrapping language models. Machine translation (MT) offers a systematic way of incorporating collections from mature, resource-rich conversational systems that may be available for a different language. However, ingesting raw translations from a general purpose MT system may not be effective owing to the presence of named entities, intra sentential code-switching and the domain mismatch between the conversational data being translated and the parallel text used for MT training. To circumvent this, we explore following domain adaptation techniques: (a) sentence embedding based data selection for MT training, (b) model finetuning, and (c) rescoring and filtering translated hypotheses. Using Hindi language as the experimental testbed, we supplement transcribed collections with translated US English utterances. We observe a relative word error rate reduction of 7.8-15.6%, depending on the bootstrapping phase. Fine grained analysis reveals that translation particularly aids the interaction scenarios underrepresented in the transcribed data.","PeriodicalId":150913,"journal":{"name":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"198 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Language Model Bootstrapping Using Neural Machine Translation for Conversational Speech Recognition\",\"authors\":\"Surabhi Punjabi, Harish Arsikere, S. Garimella\",\"doi\":\"10.1109/ASRU46091.2019.9003982\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Building conversational speech recognition systems for new languages is constrained by the availability of utterances capturing user-device interactions. Data collection is expensive and limited by speed of manual transcription. In order to address this, we advocate the use of neural machine translation as a data augmentation technique for bootstrapping language models. Machine translation (MT) offers a systematic way of incorporating collections from mature, resource-rich conversational systems that may be available for a different language. However, ingesting raw translations from a general purpose MT system may not be effective owing to the presence of named entities, intra sentential code-switching and the domain mismatch between the conversational data being translated and the parallel text used for MT training. To circumvent this, we explore following domain adaptation techniques: (a) sentence embedding based data selection for MT training, (b) model finetuning, and (c) rescoring and filtering translated hypotheses. Using Hindi language as the experimental testbed, we supplement transcribed collections with translated US English utterances. We observe a relative word error rate reduction of 7.8-15.6%, depending on the bootstrapping phase. Fine grained analysis reveals that translation particularly aids the interaction scenarios underrepresented in the transcribed data.\",\"PeriodicalId\":150913,\"journal\":{\"name\":\"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"volume\":\"198 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU46091.2019.9003982\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU46091.2019.9003982","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

为新语言构建会话语音识别系统受到捕获用户-设备交互的话语可用性的限制。数据收集费用昂贵，而且受人工转录速度的限制。为了解决这个问题，我们提倡使用神经机器翻译作为自引导语言模型的数据增强技术。机器翻译(MT)提供了一种系统的方法，可以将来自成熟的、资源丰富的会话系统的集合整合到不同的语言中。然而，从通用机器翻译系统中摄取原始翻译可能并不有效，因为存在命名实体、句子内代码切换以及正在翻译的会话数据与用于机器翻译训练的并行文本之间的域不匹配。为了避免这种情况，我们探索了以下领域自适应技术:(a)基于句子嵌入的机器翻译训练数据选择，(b)模型微调，以及(c)重新评分和过滤翻译的假设。以印地语作为实验平台，我们用翻译的美国英语话语补充转录集合。我们观察到相对的单词错误率降低了7.8-15.6%，这取决于启动阶段。细粒度分析表明，翻译特别有助于转录数据中未充分表示的交互场景。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Language Model Bootstrapping Using Neural Machine Translation for Conversational Speech Recognition

Building conversational speech recognition systems for new languages is constrained by the availability of utterances capturing user-device interactions. Data collection is expensive and limited by speed of manual transcription. In order to address this, we advocate the use of neural machine translation as a data augmentation technique for bootstrapping language models. Machine translation (MT) offers a systematic way of incorporating collections from mature, resource-rich conversational systems that may be available for a different language. However, ingesting raw translations from a general purpose MT system may not be effective owing to the presence of named entities, intra sentential code-switching and the domain mismatch between the conversational data being translated and the parallel text used for MT training. To circumvent this, we explore following domain adaptation techniques: (a) sentence embedding based data selection for MT training, (b) model finetuning, and (c) rescoring and filtering translated hypotheses. Using Hindi language as the experimental testbed, we supplement transcribed collections with translated US English utterances. We observe a relative word error rate reduction of 7.8-15.6%, depending on the bootstrapping phase. Fine grained analysis reveals that translation particularly aids the interaction scenarios underrepresented in the transcribed data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

自引率

0.00%

发文量