Improved Semi-Parametric Mean Trajectory Model Using Discriminatively Trained Centroids

2008 6th International Symposium on Chinese Spoken Language Processing Pub Date : 2008-12-30 DOI:10.1109/CHINSL.2008.ECP.63

Ran Xu, Jielin Pan, Yonghong Yan

{"title":"Improved Semi-Parametric Mean Trajectory Model Using Discriminatively Trained Centroids","authors":"Ran Xu, Jielin Pan, Yonghong Yan","doi":"10.1109/CHINSL.2008.ECP.63","DOIUrl":null,"url":null,"abstract":"In order to alleviate the limitation of \"state output probability conditional independence\" assumption held by Hidden Markov models (HMMs) in speech recognition, a discriminative semi-parametric trajectory model was proposed in recent years, in which both means and variances in the acoustic models are modeled as time-varying variables. The time- varying information is modeled as a weighted contribution from all the \"centroids\", which can be viewed as the representation of the acoustic space. In previous literatures, such centroids are often obtained by clustering the Gaussians in the baseline acoustic models to some reasonable number or by training a baseline model with fewer Gaussian components. The centroids obtained in this way are maximum likelihood estimation of the acoustic space, which are relatively weak in discriminability compared to the discriminatively trained acoustic models. In this paper, we proposed an improved semi-parametric mean trajectory model training framework, in which the centroids are first discriminatively trained by minimum phone error criterion to provide a more discriminative representation of the acoustic space. This method was evaluated on the Mandarin digit string recognition task. The experimental result shows that our proposed method improves the recognition performance by a relative string error rate reduction of 7.5% compared to the traditional discriminative semi-parametric trajectory model, and it outperforms the baseline acoustic model trained with maximum likelihood criterion by a relative string error rate reduction of 28.6%.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 6th International Symposium on Chinese Spoken Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CHINSL.2008.ECP.63","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In order to alleviate the limitation of "state output probability conditional independence" assumption held by Hidden Markov models (HMMs) in speech recognition, a discriminative semi-parametric trajectory model was proposed in recent years, in which both means and variances in the acoustic models are modeled as time-varying variables. The time- varying information is modeled as a weighted contribution from all the "centroids", which can be viewed as the representation of the acoustic space. In previous literatures, such centroids are often obtained by clustering the Gaussians in the baseline acoustic models to some reasonable number or by training a baseline model with fewer Gaussian components. The centroids obtained in this way are maximum likelihood estimation of the acoustic space, which are relatively weak in discriminability compared to the discriminatively trained acoustic models. In this paper, we proposed an improved semi-parametric mean trajectory model training framework, in which the centroids are first discriminatively trained by minimum phone error criterion to provide a more discriminative representation of the acoustic space. This method was evaluated on the Mandarin digit string recognition task. The experimental result shows that our proposed method improves the recognition performance by a relative string error rate reduction of 7.5% compared to the traditional discriminative semi-parametric trajectory model, and it outperforms the baseline acoustic model trained with maximum likelihood criterion by a relative string error rate reduction of 28.6%.

查看原文本刊更多论文

基于判别训练质心的改进半参数平均轨迹模型

为了缓解隐马尔可夫模型在语音识别中“状态输出概率条件无关”假设的局限性，近年来提出了一种判别半参数轨迹模型，该模型将声学模型的均值和方差均建模为时变变量。时变信息被建模为所有“质心”的加权贡献，这些质心可以看作是声学空间的表示。在以前的文献中，这些质心通常是通过将基线声学模型中的高斯分量聚类到一些合理的数量或通过训练具有较少高斯分量的基线模型来获得的。这种方法得到的质心是声空间的最大似然估计，与判别训练的声学模型相比，其可分辨性相对较弱。本文提出了一种改进的半参数平均轨迹模型训练框架，该框架首先采用最小电话误差准则对质心进行判别性训练，以提供更具判别性的声空间表示。在中文数字串识别任务中对该方法进行了评价。实验结果表明，与传统的判别半参数轨迹模型相比，该方法的识别性能提高了7.5%，相对错误率降低了28.6%，优于最大似然准则训练的基线声学模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2008 6th International Symposium on Chinese Spoken Language Processing

自引率

0.00%

发文量