Search error risk minimization in Viterbi beam search for speech recognition

2010 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2010-06-28 DOI:10.21437/Interspeech.2010-101

Takaaki Hori, Shinji Watanabe, Atsushi Nakamura

{"title":"Search error risk minimization in Viterbi beam search for speech recognition","authors":"Takaaki Hori, Shinji Watanabe, Atsushi Nakamura","doi":"10.21437/Interspeech.2010-101","DOIUrl":null,"url":null,"abstract":"This paper proposes a method to optimize Viterbi beam search based on search error risk minimization in large vocabulary continuous speech recognition (LVCSR). Most speech recognizers employ beam search to speed up the decoding process, in which unpromising partial hypotheses are pruned during decoding. However, the pruning step involves the risk of missing the best complete hypothesis by discarding a partial hypothesis that might grow into the best. Missing the best hypothesis is called search error. Our purpose is to reduce search error by optimizing the pruning step. While conventional methods use heuristic criteria to prune each hypothesis based on its score, rank, and so on, our proposed method introduces a pruning function that makes a more precise decision using the rich features extracted from each hypothesis. The parameters of the function can be estimated efficiently to minimize the search error risk using recognition lattices at the training step. We implemented the new method in a WFST-based decoder and achieved a significant reduction of search errors in a 200K-word LVCSR task.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/Interspeech.2010-101","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

This paper proposes a method to optimize Viterbi beam search based on search error risk minimization in large vocabulary continuous speech recognition (LVCSR). Most speech recognizers employ beam search to speed up the decoding process, in which unpromising partial hypotheses are pruned during decoding. However, the pruning step involves the risk of missing the best complete hypothesis by discarding a partial hypothesis that might grow into the best. Missing the best hypothesis is called search error. Our purpose is to reduce search error by optimizing the pruning step. While conventional methods use heuristic criteria to prune each hypothesis based on its score, rank, and so on, our proposed method introduces a pruning function that makes a more precise decision using the rich features extracted from each hypothesis. The parameters of the function can be estimated efficiently to minimize the search error risk using recognition lattices at the training step. We implemented the new method in a WFST-based decoder and achieved a significant reduction of search errors in a 200K-word LVCSR task.

查看原文本刊更多论文

语音识别中维特比波束搜索误差风险最小化

提出了一种基于搜索错误风险最小化的大词汇量连续语音识别(LVCSR)维特比波束搜索优化方法。大多数语音识别器采用波束搜索来加快解码过程，在解码过程中修剪不希望的部分假设。然而，修剪步骤包含了通过丢弃可能成长为最佳的部分假设而错过最佳完整假设的风险。错过最好的假设被称为搜索错误。我们的目的是通过优化剪枝步骤来减少搜索误差。传统方法使用启发式标准根据分数、等级等对每个假设进行修剪，而我们提出的方法引入了一个修剪函数，该函数使用从每个假设中提取的丰富特征做出更精确的决策。在训练阶段，利用识别格可以有效地估计函数的参数，使搜索误差最小化。我们在一个基于wfst的解码器中实现了新方法，并在一个20万字的LVCSR任务中显著减少了搜索错误。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2010 IEEE International Conference on Acoustics, Speech and Signal Processing

自引率

0.00%

发文量