会话语音识别中使用神经机器翻译的语言模型自引导

Surabhi Punjabi, Harish Arsikere, S. Garimella
{"title":"会话语音识别中使用神经机器翻译的语言模型自引导","authors":"Surabhi Punjabi, Harish Arsikere, S. Garimella","doi":"10.1109/ASRU46091.2019.9003982","DOIUrl":null,"url":null,"abstract":"Building conversational speech recognition systems for new languages is constrained by the availability of utterances capturing user-device interactions. Data collection is expensive and limited by speed of manual transcription. In order to address this, we advocate the use of neural machine translation as a data augmentation technique for bootstrapping language models. Machine translation (MT) offers a systematic way of incorporating collections from mature, resource-rich conversational systems that may be available for a different language. However, ingesting raw translations from a general purpose MT system may not be effective owing to the presence of named entities, intra sentential code-switching and the domain mismatch between the conversational data being translated and the parallel text used for MT training. To circumvent this, we explore following domain adaptation techniques: (a) sentence embedding based data selection for MT training, (b) model finetuning, and (c) rescoring and filtering translated hypotheses. Using Hindi language as the experimental testbed, we supplement transcribed collections with translated US English utterances. We observe a relative word error rate reduction of 7.8-15.6%, depending on the bootstrapping phase. Fine grained analysis reveals that translation particularly aids the interaction scenarios underrepresented in the transcribed data.","PeriodicalId":150913,"journal":{"name":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"198 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Language Model Bootstrapping Using Neural Machine Translation for Conversational Speech Recognition\",\"authors\":\"Surabhi Punjabi, Harish Arsikere, S. Garimella\",\"doi\":\"10.1109/ASRU46091.2019.9003982\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Building conversational speech recognition systems for new languages is constrained by the availability of utterances capturing user-device interactions. Data collection is expensive and limited by speed of manual transcription. In order to address this, we advocate the use of neural machine translation as a data augmentation technique for bootstrapping language models. Machine translation (MT) offers a systematic way of incorporating collections from mature, resource-rich conversational systems that may be available for a different language. However, ingesting raw translations from a general purpose MT system may not be effective owing to the presence of named entities, intra sentential code-switching and the domain mismatch between the conversational data being translated and the parallel text used for MT training. To circumvent this, we explore following domain adaptation techniques: (a) sentence embedding based data selection for MT training, (b) model finetuning, and (c) rescoring and filtering translated hypotheses. Using Hindi language as the experimental testbed, we supplement transcribed collections with translated US English utterances. We observe a relative word error rate reduction of 7.8-15.6%, depending on the bootstrapping phase. Fine grained analysis reveals that translation particularly aids the interaction scenarios underrepresented in the transcribed data.\",\"PeriodicalId\":150913,\"journal\":{\"name\":\"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"volume\":\"198 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU46091.2019.9003982\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU46091.2019.9003982","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

为新语言构建会话语音识别系统受到捕获用户-设备交互的话语可用性的限制。数据收集费用昂贵,而且受人工转录速度的限制。为了解决这个问题,我们提倡使用神经机器翻译作为自引导语言模型的数据增强技术。机器翻译(MT)提供了一种系统的方法,可以将来自成熟的、资源丰富的会话系统的集合整合到不同的语言中。然而,从通用机器翻译系统中摄取原始翻译可能并不有效,因为存在命名实体、句子内代码切换以及正在翻译的会话数据与用于机器翻译训练的并行文本之间的域不匹配。为了避免这种情况,我们探索了以下领域自适应技术:(a)基于句子嵌入的机器翻译训练数据选择,(b)模型微调,以及(c)重新评分和过滤翻译的假设。以印地语作为实验平台,我们用翻译的美国英语话语补充转录集合。我们观察到相对的单词错误率降低了7.8-15.6%,这取决于启动阶段。细粒度分析表明,翻译特别有助于转录数据中未充分表示的交互场景。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Language Model Bootstrapping Using Neural Machine Translation for Conversational Speech Recognition
Building conversational speech recognition systems for new languages is constrained by the availability of utterances capturing user-device interactions. Data collection is expensive and limited by speed of manual transcription. In order to address this, we advocate the use of neural machine translation as a data augmentation technique for bootstrapping language models. Machine translation (MT) offers a systematic way of incorporating collections from mature, resource-rich conversational systems that may be available for a different language. However, ingesting raw translations from a general purpose MT system may not be effective owing to the presence of named entities, intra sentential code-switching and the domain mismatch between the conversational data being translated and the parallel text used for MT training. To circumvent this, we explore following domain adaptation techniques: (a) sentence embedding based data selection for MT training, (b) model finetuning, and (c) rescoring and filtering translated hypotheses. Using Hindi language as the experimental testbed, we supplement transcribed collections with translated US English utterances. We observe a relative word error rate reduction of 7.8-15.6%, depending on the bootstrapping phase. Fine grained analysis reveals that translation particularly aids the interaction scenarios underrepresented in the transcribed data.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信