Comments on Vocal Tract Length Normalization Equals Linear Transformation in Cepstral Space

IEEE Trans. Speech Audio Process. Pub Date : 2007-07-01 DOI:10.1109/TASL.2007.896653

M. Afify, O. Siohan

引用次数: 11

Abstract

The bilinear transformation (BT) is used for vocal tract length normalization (VTLN) in speech recogniton systems. We prove two properties of the bilinear mapping that motivated the band-diagonal transform proposed in M. Afify and O. Siohan, (ldquoConstrained maximum likelihood linear regression for speaker adaptation,rdquo in Proc. ICSLP, Beijing, China, Oct. 2000.) This is in contrast to what is stated in M. Pitz and H. Ney, (ldquoVocal tract length normalization equals linear transformation in cepstral space,rdquo IEEE Transactions on Speech and Audio Processing, vol. 13, no. 5, pp 930-944, September 2005) that the transform of Afify and Siohan was motivated by empirical observations.

查看原文本刊更多论文

关于声道长度归一化等于背侧空间线性变换的评述

在语音识别系统中，双线性变换(BT)被用于声道长度归一化。我们证明了M. Afify和O. Siohan提出的双线性映射的两个性质，这两个性质驱动了带对角变换(ldq -约束的最大似然线性回归用于讲话人自适应，rdquo . Proc. ICSLP，北京，中国，2000年10月)。这与M. Pitz和H. Ney在《声道长度归一化等于倒谱空间的线性变换》中所陈述的相反，见《IEEE语音与音频处理汇刊》第13卷第1期。5, 930-944页，2005年9月)，Afify和Siohan的转变是由实证观察推动的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Trans. Speech Audio Process.

自引率

0.00%

发文量