Neural Oracle Search on N-BEST Hypotheses

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2020-05-01 DOI:10.1109/ICASSP40776.2020.9054745

Ehsan Variani, Tongzhou Chen, J. Apfel, B. Ramabhadran, Seungjin Lee, P. Moreno

{"title":"Neural Oracle Search on N-BEST Hypotheses","authors":"Ehsan Variani, Tongzhou Chen, J. Apfel, B. Ramabhadran, Seungjin Lee, P. Moreno","doi":"10.1109/ICASSP40776.2020.9054745","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a neural search algorithm to select the most likely hypothesis using a sequence of acoustic representations and multiple hypotheses as input. The algorithm provides a sequence level score for each audio-hypothesis pair that is obtained by integrating information from multiple sources, such as the input acoustic representations, N-best hypotheses, additional 1st-pass statistics, and unpaired textual information through an external language model. These scores are then used to map the search problem of identifying the most likely hypothesis to a sequence classification problem. The definition of the proposed algorithm is broad enough to allow its use as an alternative to beam search in the 1st-pass or as a 2nd-pass, rescoring step. This algorithm achieves up to 12% relative reductions in Word Error Rate (WER) across several languages over state-of-the-art baselines with relatively few additional parameters. We also propose the use of a binary classifier gating function that can learn to trigger the 2nd-pass neural search model when the 1-best hypothesis is not the oracle hypothesis, thereby avoiding extra computation.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"50 1","pages":"7824-7828"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP40776.2020.9054745","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

Abstract

In this paper, we propose a neural search algorithm to select the most likely hypothesis using a sequence of acoustic representations and multiple hypotheses as input. The algorithm provides a sequence level score for each audio-hypothesis pair that is obtained by integrating information from multiple sources, such as the input acoustic representations, N-best hypotheses, additional 1st-pass statistics, and unpaired textual information through an external language model. These scores are then used to map the search problem of identifying the most likely hypothesis to a sequence classification problem. The definition of the proposed algorithm is broad enough to allow its use as an alternative to beam search in the 1st-pass or as a 2nd-pass, rescoring step. This algorithm achieves up to 12% relative reductions in Word Error Rate (WER) across several languages over state-of-the-art baselines with relatively few additional parameters. We also propose the use of a binary classifier gating function that can learn to trigger the 2nd-pass neural search model when the 1-best hypothesis is not the oracle hypothesis, thereby avoiding extra computation.

查看原文本刊更多论文

基于N-BEST假设的神经Oracle搜索

在本文中，我们提出了一种神经搜索算法，使用一系列声学表示和多个假设作为输入来选择最可能的假设。该算法为每个音频-假设对提供序列级评分，这些音频-假设对是通过整合来自多个来源的信息获得的，例如输入声学表示、n个最佳假设、额外的第一次通过统计数据，以及通过外部语言模型获得的未配对文本信息。然后使用这些分数将识别最可能假设的搜索问题映射到序列分类问题。所提出的算法的定义足够广泛，可以作为波束搜索的替代方案，在第一遍或第二遍重新记录步骤中使用。该算法在最先进的基线上，在相对较少的额外参数下，在几种语言之间的单词错误率(WER)相对降低了12%。我们还建议使用一个二元分类器门控函数，当第一最佳假设不是oracle假设时，它可以学习触发第二次神经搜索模型，从而避免额外的计算。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

自引率

0.00%

发文量