Vector quantization of LSF parameters with a mixture of dirichlet distributions

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-09-01 DOI:10.1109/TASL.2013.2238732

Zhanyu Ma, A. Leijon, W. Kleijn

{"title":"Vector quantization of LSF parameters with a mixture of dirichlet distributions","authors":"Zhanyu Ma, A. Leijon, W. Kleijn","doi":"10.1109/TASL.2013.2238732","DOIUrl":null,"url":null,"abstract":"Quantization of the linear predictive coding parameters is an important part in speech coding. Probability density function (PDF)-optimized vector quantization (VQ) has been previously shown to be more efficient than VQ based only on training data. For data with bounded support, some well-defined bounded-support distributions (e.g., the Dirichlet distribution) have been proven to outperform the conventional Gaussian mixture model (GMM), with the same number of free parameters required to describe the model. When exploiting both the boundary and the order properties of the line spectral frequency (LSF) parameters, the distribution of LSF differences LSF can be modelled with a Dirichlet mixture model (DMM). We propose a corresponding DMM based VQ. The elements in a Dirichlet vector variable are highly mutually correlated. Motivated by the Dirichlet vector variable's neutrality property, a practical non-linear transformation scheme for the Dirichlet vector variable can be obtained. Similar to the Karhunen-Loève transform for Gaussian variables, this non-linear transformation decomposes the Dirichlet vector variable into a set of independent beta-distributed variables. Using high rate quantization theory and by the entropy constraint, the optimal inter- and intra-component bit allocation strategies are proposed. In the implementation of scalar quantizers, we use the constrained-resolution coding to approximate the derived constrained-entropy coding. A practical coding scheme for DVQ is designed for the purpose of reducing the quantization error accumulation. The theoretical and practical quantization performance of DVQ is evaluated. Compared to the state-of-the-art GMM-based VQ and recently proposed beta mixture model (BMM) based VQ, DVQ performs better, with even fewer free parameters and lower computational cost","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":"21 1","pages":"1777-1790"},"PeriodicalIF":0.0000,"publicationDate":"2013-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2238732","citationCount":"55","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Audio Speech and Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TASL.2013.2238732","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 55

Abstract

Quantization of the linear predictive coding parameters is an important part in speech coding. Probability density function (PDF)-optimized vector quantization (VQ) has been previously shown to be more efficient than VQ based only on training data. For data with bounded support, some well-defined bounded-support distributions (e.g., the Dirichlet distribution) have been proven to outperform the conventional Gaussian mixture model (GMM), with the same number of free parameters required to describe the model. When exploiting both the boundary and the order properties of the line spectral frequency (LSF) parameters, the distribution of LSF differences LSF can be modelled with a Dirichlet mixture model (DMM). We propose a corresponding DMM based VQ. The elements in a Dirichlet vector variable are highly mutually correlated. Motivated by the Dirichlet vector variable's neutrality property, a practical non-linear transformation scheme for the Dirichlet vector variable can be obtained. Similar to the Karhunen-Loève transform for Gaussian variables, this non-linear transformation decomposes the Dirichlet vector variable into a set of independent beta-distributed variables. Using high rate quantization theory and by the entropy constraint, the optimal inter- and intra-component bit allocation strategies are proposed. In the implementation of scalar quantizers, we use the constrained-resolution coding to approximate the derived constrained-entropy coding. A practical coding scheme for DVQ is designed for the purpose of reducing the quantization error accumulation. The theoretical and practical quantization performance of DVQ is evaluated. Compared to the state-of-the-art GMM-based VQ and recently proposed beta mixture model (BMM) based VQ, DVQ performs better, with even fewer free parameters and lower computational cost

查看原文本刊更多论文

混合狄利克雷分布LSF参数的矢量量化

线性预测编码参数的量化是语音编码的重要组成部分。概率密度函数(PDF)优化向量量化(VQ)先前已被证明比仅基于训练数据的矢量量化(VQ)更有效。对于具有有界支持的数据，一些定义良好的有界支持分布(例如Dirichlet分布)已被证明优于传统的高斯混合模型(GMM)，具有相同数量的描述模型所需的自由参数。利用线谱频率参数的边界特性和阶数特性，可以用Dirichlet混合模型(DMM)对线谱频率差的分布进行建模。我们提出了一个相应的基于DMM的VQ。狄利克雷向量变量中的元素是高度相互关联的。利用狄利克雷向量变量的中立性，可以得到狄利克雷向量变量的一种实用的非线性变换方案。类似于高斯变量的karhunen - lo变换，这种非线性变换将狄利克雷向量变量分解为一组独立的β分布变量。利用高速率量化理论和熵约束，提出了组件间和组件内的最佳比特分配策略。在标量量化器的实现中，我们使用约束分辨率编码来近似派生的约束熵编码。为了减少量化误差积累，设计了一种实用的DVQ编码方案。对DVQ的量化性能进行了理论和实践评价。与最先进的基于gmm的VQ和最近提出的基于beta混合模型(BMM)的VQ相比，DVQ的性能更好，自由参数更少，计算成本更低

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Audio Speech and Language Processing 工程技术-工程：电子与电气

自引率

0.00%

发文量

审稿时长

24.0 months

期刊介绍： The IEEE Transactions on Audio, Speech and Language Processing covers the sciences, technologies and applications relating to the analysis, coding, enhancement, recognition and synthesis of audio, music, speech and language. In particular, audio processing also covers auditory modeling, acoustic modeling and source separation. Speech processing also covers speech production and perception, adaptation, lexical modeling and speaker recognition. Language processing also covers spoken language understanding, translation, summarization, mining, general language modeling, as well as spoken dialog systems.