基于张量因子分析的语音表示及其在说话人识别和语言识别中的应用

2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2019-11-01 DOI:10.1109/APSIPAASC47483.2019.9023128

D. Saito, So Suzuki, N. Minematsu

{"title":"基于张量因子分析的语音表示及其在说话人识别和语言识别中的应用","authors":"D. Saito, So Suzuki, N. Minematsu","doi":"10.1109/APSIPAASC47483.2019.9023128","DOIUrl":null,"url":null,"abstract":"Ahstract-This paper proposes a novel approach to speech representation for both speaker recognition and language identification by characterizing the entire feature space by a tensor. In conventional studies of both tasks, i-vector is commonly used as the state-of-the-art representation. Here, i-vector extraction can be regarded as projection of utterance-based GMM supervector onto a low-dimensional space. In this paper, for the aim of explicit modeling of the correlation among mean vectors of a GMM, an utterance is not modeled as its GMM-based supervector but as its matrix and the entire set of utterances is modeled as its tensor. By applying tensor factor analysis, we obtain a new representation for an input utterance. Experimental evaluations for speaker recognition and language identification show that our proposed approach has effectiveness especially for the speaker recognition task.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Speech representation based on tensor factor analysis and its application to speaker recognition and language identification\",\"authors\":\"D. Saito, So Suzuki, N. Minematsu\",\"doi\":\"10.1109/APSIPAASC47483.2019.9023128\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Ahstract-This paper proposes a novel approach to speech representation for both speaker recognition and language identification by characterizing the entire feature space by a tensor. In conventional studies of both tasks, i-vector is commonly used as the state-of-the-art representation. Here, i-vector extraction can be regarded as projection of utterance-based GMM supervector onto a low-dimensional space. In this paper, for the aim of explicit modeling of the correlation among mean vectors of a GMM, an utterance is not modeled as its GMM-based supervector but as its matrix and the entire set of utterances is modeled as its tensor. By applying tensor factor analysis, we obtain a new representation for an input utterance. Experimental evaluations for speaker recognition and language identification show that our proposed approach has effectiveness especially for the speaker recognition task.\",\"PeriodicalId\":145222,\"journal\":{\"name\":\"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/APSIPAASC47483.2019.9023128\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSIPAASC47483.2019.9023128","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

摘要:本文提出了一种新的语音表示方法，通过张量来表征整个特征空间，用于说话人识别和语言识别。在这两项任务的传统研究中，i向量通常被用作最先进的表示。在这里，i向量提取可以看作是基于话语的GMM超向量在低维空间上的投影。为了显式建模GMM均值向量之间的相关性，本文不将话语建模为基于GMM的超向量，而是将其建模为其矩阵，并将整个话语集建模为其张量。通过张量因子分析，我们得到了输入话语的一种新的表示形式。对说话人识别和语言识别的实验评价表明，该方法对说话人识别任务具有较好的效果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Speech representation based on tensor factor analysis and its application to speaker recognition and language identification

Ahstract-This paper proposes a novel approach to speech representation for both speaker recognition and language identification by characterizing the entire feature space by a tensor. In conventional studies of both tasks, i-vector is commonly used as the state-of-the-art representation. Here, i-vector extraction can be regarded as projection of utterance-based GMM supervector onto a low-dimensional space. In this paper, for the aim of explicit modeling of the correlation among mean vectors of a GMM, an utterance is not modeled as its GMM-based supervector but as its matrix and the entire set of utterances is modeled as its tensor. By applying tensor factor analysis, we obtain a new representation for an input utterance. Experimental evaluations for speaker recognition and language identification show that our proposed approach has effectiveness especially for the speaker recognition task.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

自引率

0.00%

发文量