Monotonic Recurrent Neural Network Transducer and Decoding Strategies

Anshuman Tripathi, Han Lu, H. Sak, H. Soltau
{"title":"Monotonic Recurrent Neural Network Transducer and Decoding Strategies","authors":"Anshuman Tripathi, Han Lu, H. Sak, H. Soltau","doi":"10.1109/ASRU46091.2019.9003822","DOIUrl":null,"url":null,"abstract":"Recurrent Neural Network Transducer (RNNT) is an end-to-end model which transduces discrete input sequences to output sequences by learning alignments between the sequences. In speech recognition tasks we generally have a strictly monotonic alignment between time frames and label sequence. However, the standard RNNT loss does not enforce this constraint. This can cause some anomalies in alignments such as the model outputting a sequence of labels at a single time frame. There is also no bound on the decoding time steps. To address these problems, we introduce a monotonic version of the RNNT loss. Under the assumption that the output sequence is not longer than the input sequence, this loss can be used with forward-backward algorithm to learn strictly monotonic alignments between the sequences. We present experimental studies showing that speech recognition accuracy for monotonic RNNT is equivalent to standard RNNT. We also explore best-first and breadth-first decoding strategies for both monotonic and standard RNNT models. Our experiments show that breadth-first search is effective in exploring and combining alternative alignments. Additionally, it also allows batching of hypotheses during search label expansion, allowing better resource utilization, and resulting in decoding speedup.","PeriodicalId":150913,"journal":{"name":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"38","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU46091.2019.9003822","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 38

Abstract

Recurrent Neural Network Transducer (RNNT) is an end-to-end model which transduces discrete input sequences to output sequences by learning alignments between the sequences. In speech recognition tasks we generally have a strictly monotonic alignment between time frames and label sequence. However, the standard RNNT loss does not enforce this constraint. This can cause some anomalies in alignments such as the model outputting a sequence of labels at a single time frame. There is also no bound on the decoding time steps. To address these problems, we introduce a monotonic version of the RNNT loss. Under the assumption that the output sequence is not longer than the input sequence, this loss can be used with forward-backward algorithm to learn strictly monotonic alignments between the sequences. We present experimental studies showing that speech recognition accuracy for monotonic RNNT is equivalent to standard RNNT. We also explore best-first and breadth-first decoding strategies for both monotonic and standard RNNT models. Our experiments show that breadth-first search is effective in exploring and combining alternative alignments. Additionally, it also allows batching of hypotheses during search label expansion, allowing better resource utilization, and resulting in decoding speedup.
单调递归神经网络传感器与解码策略
递归神经网络传感器(RNNT)是一种端到端模型,它通过学习序列之间的对齐,将离散输入序列转换为输出序列。在语音识别任务中,我们通常在时间框架和标签序列之间有严格的单调对齐。然而,标准的RNNT损耗并没有强制执行这个约束。这可能会导致对齐中的一些异常,例如模型在单个时间框架中输出一系列标签。解码的时间步长也没有限制。为了解决这些问题,我们引入了RNNT损耗的单调版本。在假设输出序列不长于输入序列的情况下,该损失可以与正-倒向算法一起用于学习序列之间的严格单调排列。我们的实验研究表明,单调RNNT的语音识别精度与标准RNNT相当。我们还探讨了单调和标准RNNT模型的最佳优先和宽度优先解码策略。我们的实验表明,宽度优先搜索在探索和组合备选对齐方面是有效的。此外,它还允许在搜索标签扩展过程中对假设进行批处理,从而更好地利用资源,从而提高解码速度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信