Speaker and Language Aware Training for End-to-End ASR

Shubham Bansal, Karan Malhotra, Sriram Ganapathy
{"title":"Speaker and Language Aware Training for End-to-End ASR","authors":"Shubham Bansal, Karan Malhotra, Sriram Ganapathy","doi":"10.1109/ASRU46091.2019.9004000","DOIUrl":null,"url":null,"abstract":"The end-to-end (E2E) approach to automatic speech recognition (ASR) is a simplified and an elegant approach where a single deep neural network model directly converts the acoustic feature sequence to the text sequence. The current approach to end-to-end ASR uses the neural network model (trained with sequence loss) along with an external character/word based language model (LM) in a decoding pass to output the text sequence. In this work, we propose a new objective function for end-to-end ASR training where the LM score is explicitly introduced in the attention model loss function without any additional training parameters. In this manner, the neural network is made LM aware and this simplifies the model training process. We also propose to incorporate an attention based sequence summary feature in the ASR model which allows the system to be speaker aware. With several E2E ASR experiments on TED-LIUM, WSJ and Librispeech datasets, we show that the proposed speaker and LM aware training improves the ASR performance significantly over the state-of-art E2E approaches. We achieve the best published results reported for WSJ dataset.","PeriodicalId":150913,"journal":{"name":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"150 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU46091.2019.9004000","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

The end-to-end (E2E) approach to automatic speech recognition (ASR) is a simplified and an elegant approach where a single deep neural network model directly converts the acoustic feature sequence to the text sequence. The current approach to end-to-end ASR uses the neural network model (trained with sequence loss) along with an external character/word based language model (LM) in a decoding pass to output the text sequence. In this work, we propose a new objective function for end-to-end ASR training where the LM score is explicitly introduced in the attention model loss function without any additional training parameters. In this manner, the neural network is made LM aware and this simplifies the model training process. We also propose to incorporate an attention based sequence summary feature in the ASR model which allows the system to be speaker aware. With several E2E ASR experiments on TED-LIUM, WSJ and Librispeech datasets, we show that the proposed speaker and LM aware training improves the ASR performance significantly over the state-of-art E2E approaches. We achieve the best published results reported for WSJ dataset.
端到端ASR的说话者和语言意识训练
自动语音识别(ASR)的端到端(E2E)方法是一种简化和优雅的方法,其中单个深度神经网络模型直接将声学特征序列转换为文本序列。目前的端到端ASR方法使用神经网络模型(用序列损失训练)以及解码传递中的外部基于字符/单词的语言模型(LM)来输出文本序列。在这项工作中,我们提出了一个新的端到端ASR训练目标函数,其中LM分数被明确地引入到注意力模型损失函数中,而不需要任何额外的训练参数。通过这种方式,神经网络被LM感知,从而简化了模型训练过程。我们还建议在ASR模型中加入一个基于注意力的序列汇总特征,使系统能够感知说话人。通过在TED-LIUM、WSJ和librisspeech数据集上进行的几次E2E自动语音识别实验,我们表明,与目前最先进的E2E方法相比,我们提出的演讲者和LM感知训练显著提高了自动语音识别的性能。我们获得了华尔街日报数据集报道的最佳发表结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信