Ehsan Variani, Tongzhou Chen, J. Apfel, B. Ramabhadran, Seungjin Lee, P. Moreno
{"title":"Neural Oracle Search on N-BEST Hypotheses","authors":"Ehsan Variani, Tongzhou Chen, J. Apfel, B. Ramabhadran, Seungjin Lee, P. Moreno","doi":"10.1109/ICASSP40776.2020.9054745","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a neural search algorithm to select the most likely hypothesis using a sequence of acoustic representations and multiple hypotheses as input. The algorithm provides a sequence level score for each audio-hypothesis pair that is obtained by integrating information from multiple sources, such as the input acoustic representations, N-best hypotheses, additional 1st-pass statistics, and unpaired textual information through an external language model. These scores are then used to map the search problem of identifying the most likely hypothesis to a sequence classification problem. The definition of the proposed algorithm is broad enough to allow its use as an alternative to beam search in the 1st-pass or as a 2nd-pass, rescoring step. This algorithm achieves up to 12% relative reductions in Word Error Rate (WER) across several languages over state-of-the-art baselines with relatively few additional parameters. We also propose the use of a binary classifier gating function that can learn to trigger the 2nd-pass neural search model when the 1-best hypothesis is not the oracle hypothesis, thereby avoiding extra computation.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"50 1","pages":"7824-7828"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP40776.2020.9054745","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
Abstract
In this paper, we propose a neural search algorithm to select the most likely hypothesis using a sequence of acoustic representations and multiple hypotheses as input. The algorithm provides a sequence level score for each audio-hypothesis pair that is obtained by integrating information from multiple sources, such as the input acoustic representations, N-best hypotheses, additional 1st-pass statistics, and unpaired textual information through an external language model. These scores are then used to map the search problem of identifying the most likely hypothesis to a sequence classification problem. The definition of the proposed algorithm is broad enough to allow its use as an alternative to beam search in the 1st-pass or as a 2nd-pass, rescoring step. This algorithm achieves up to 12% relative reductions in Word Error Rate (WER) across several languages over state-of-the-art baselines with relatively few additional parameters. We also propose the use of a binary classifier gating function that can learn to trigger the 2nd-pass neural search model when the 1-best hypothesis is not the oracle hypothesis, thereby avoiding extra computation.