A comparative study of model-based adaptation techniques for a compact speech recognizer

IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01. Pub Date : 2001-12-09 DOI:10.1109/ASRU.2001.1034581

F. Thiele, R. Bippus

引用次数: 2

Abstract

Many techniques for speaker adaptation have been successfully applied to automatic speech recognition. This paper compares the performance of several adaptation methods with respect to their memory need and processing demand. For adaptation of a compact acoustic model with 4k densities, eigenvoices and structural MAP (SMAP) are investigated next to the well-known techniques of MAP (maximum a posteriori) and MLLR (maximum likelihood linear regression) adaptation. Experimental results are reported for unsupervised on-line adaptation on different amounts of adaptation data ranging from 4 to 500 words per speaker. The results show that for small amounts of adaptation data it might be more efficient to employ a larger baseline acoustic model without adaptation. Eigenvoices achieve the lowest word error rates of all adaptation techniques but SMAP presents a good compromise between memory requirement and accuracy.

查看原文本刊更多论文

紧凑型语音识别器基于模型的自适应技术比较研究

许多说话人自适应技术已经成功地应用于语音自动识别中。本文从记忆需求和处理需求两方面比较了几种自适应方法的性能。为了适应具有4k密度的紧凑声学模型，除了众所周知的MAP(最大后验)和MLLR(最大似然线性回归)自适应技术外，还研究了特征声和结构MAP (SMAP)。本文报道了在每个说话者4 ~ 500个单词的不同适应数据量上的无监督在线适应实验结果。结果表明，对于少量的适应数据，采用较大的基线声学模型而不进行适应可能会更有效。在所有自适应技术中，特征语音的错误率最低，而SMAP在记忆要求和准确性之间取得了很好的折衷。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.

自引率

0.00%

发文量