A factor analysis model of sequences for language recognition

2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI:10.1109/SLT.2016.7846287

M. Omar

{"title":"A factor analysis model of sequences for language recognition","authors":"M. Omar","doi":"10.1109/SLT.2016.7846287","DOIUrl":null,"url":null,"abstract":"Joint factor analysis [1] application to speaker and language recognition advanced the performance of automatic systems in these areas. A special case of the early work in [1], namely the i-vector representation [2], has been applied successfully in many areas including speaker [2], language [3], and speech recognition [4]. This work presents a novel model which represents a long sequence of observations using the factor analysis model of shorter overlapping subsquences. This model takes into consideration the dependency of the adjacent latent vectors. It is shown that this model outperforms the current joint factor analysis approach based on the assumption of independent and identically distributed (iid) observations given one global latent vector. In addition, we replace the language-independent prior model of the latent vector in the i-vector model with a language-dependent prior model and modify the objective function used in the estimation of the factor analysis projection matrix and the prior model to correspond to the cross-entropy objective function estimated based on this new model. We derive also the update equations of the projection matrix and the prior model parameters which maximize the cross-entropy objective function. We evaluate the performance of our approach on the language recognition task of the robust automatic transcription of speech (RATS) project. Our experiments show improvements up to 11% relative using the proposed approach in terms of equal error rate compared to the standard approach of using an i-vector representation [2].","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2016.7846287","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Joint factor analysis [1] application to speaker and language recognition advanced the performance of automatic systems in these areas. A special case of the early work in [1], namely the i-vector representation [2], has been applied successfully in many areas including speaker [2], language [3], and speech recognition [4]. This work presents a novel model which represents a long sequence of observations using the factor analysis model of shorter overlapping subsquences. This model takes into consideration the dependency of the adjacent latent vectors. It is shown that this model outperforms the current joint factor analysis approach based on the assumption of independent and identically distributed (iid) observations given one global latent vector. In addition, we replace the language-independent prior model of the latent vector in the i-vector model with a language-dependent prior model and modify the objective function used in the estimation of the factor analysis projection matrix and the prior model to correspond to the cross-entropy objective function estimated based on this new model. We derive also the update equations of the projection matrix and the prior model parameters which maximize the cross-entropy objective function. We evaluate the performance of our approach on the language recognition task of the robust automatic transcription of speech (RATS) project. Our experiments show improvements up to 11% relative using the proposed approach in terms of equal error rate compared to the standard approach of using an i-vector representation [2].

查看原文本刊更多论文

用于语言识别的序列因子分析模型

联合因子分析[1]在说话人和语言识别中的应用提高了自动系统在这些领域的性能。[1]中早期工作的一个特例，即i向量表示[2]，已经成功地应用于许多领域，包括说话人[2]、语言[3]和语音识别[4]。这项工作提出了一个新的模型，它代表了一个长序列的观察使用较短的重叠子序列的因子分析模型。该模型考虑了相邻潜在向量的相关性。结果表明，该模型优于当前基于独立同分布(iid)观测值假设的联合因子分析方法。此外，我们将i-vector模型中潜在向量的语言无关先验模型替换为语言相关先验模型，并修改用于估计因子分析投影矩阵和先验模型的目标函数，使其对应于基于该新模型估计的交叉熵目标函数。我们还推导了投影矩阵的更新方程和使交叉熵目标函数最大化的先验模型参数。我们评估了我们的方法在鲁棒自动语音转录(RATS)项目的语言识别任务中的性能。我们的实验表明，与使用i向量表示的标准方法相比，使用所提出的方法在相同错误率方面的改进高达11%[2]。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 IEEE Spoken Language Technology Workshop (SLT)

自引率

0.00%

发文量