Histogram based normalization in the acoustic feature space

IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01. Pub Date : 2001-12-09 DOI:10.1109/ASRU.2001.1034579

S. Molau, Michael Pitz, H. Ney

引用次数: 70

Abstract

We describe a technique called histogram normalization that aims at normalizing feature space distributions at different stages in the signal analysis front-end, namely the log-compressed filterbank vectors, cepstrum coefficients, and LDA (local density approximation) transformed acoustic vectors. Best results are obtained at the filterbank, and in most cases there is a minor additional gain when normalization is applied sequentially at different stages. We show that histogram normalization performs best if applied both in training and recognition, and that smoothing the target histogram obtained on the training data is also helpful. On the VerbMobil II corpus, a German large-vocabulary conversational speech recognition task, we achieve an overall reduction in word error rate of about 10% relative.

查看原文本刊更多论文

声学特征空间中基于直方图的归一化

我们描述了一种称为直方图归一化的技术，旨在对信号分析前端不同阶段的特征空间分布进行归一化，即对数压缩滤波器组向量、倒谱系数和LDA(局部密度近似)变换后的声学向量。在滤波器组获得最佳结果，并且在大多数情况下，当在不同阶段依次应用规范化时，会有轻微的额外增益。我们表明，直方图归一化在训练和识别中都是最好的，并且平滑训练数据上得到的目标直方图也很有帮助。在vermobil II语料库(一个德语大词汇会话语音识别任务)上，我们实现了单词错误率相对降低10%左右的总体目标。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.

自引率

0.00%

发文量