Evaluating long-term spectral subtraction for reverberant ASR

IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01. Pub Date : 2001-12-09 DOI:10.1109/ASRU.2001.1034598

David Gelbart, Nelson Morgan

引用次数: 46

Abstract

Even a modest degree of room reverberation can greatly increase the difficulty of automatic speech recognition. We have observed large increases in speech recognition word error rates when using a far-field (3-6 feet) microphone in a conference room, in comparison with recordings from head-mounted microphones. In this paper, we describe experiments with a proposed remedy based on the subtraction of an estimate of the log spectrum from a long-term (e.g., 2 s) analysis window, followed by overlap-add resynthesis. Since the technique is essentially one of enhancement, the processed signal it generates can be used as input for complete speech recognition systems. Here we report results with both the HTK and the SRI Hub-5 recognizer. For simpler recognizer configurations and/or moderate-sized training, the improvements are huge, while moderate improvements are still observed for more complex configurations under a number of conditions.

查看原文本刊更多论文

评估混响ASR的长期频谱减法

即使是适度的室内混响也会大大增加自动语音识别的难度。我们观察到，与头戴式麦克风的录音相比，在会议室使用远场(3-6英尺)麦克风时，语音识别单词错误率大幅增加。在本文中，我们描述了一种基于从长期(例如，2秒)分析窗口中减去对数谱估计值的拟议补救方法的实验，然后进行重叠添加再合成。由于该技术本质上是一种增强技术，因此它产生的处理信号可以用作完整语音识别系统的输入。在这里，我们报告了HTK和SRI Hub-5识别器的结果。对于更简单的识别器配置和/或中等规模的训练，改进是巨大的，而在许多条件下，对于更复杂的配置，仍然可以观察到适度的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.

自引率

0.00%

发文量