特征空间中的鲁棒说话人聚类

IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01. Pub Date : 2001-12-09 DOI:10.1109/ASRU.2001.1034588

R. Faltlhauser, G. Ruske

{"title":"特征空间中的鲁棒说话人聚类","authors":"R. Faltlhauser, G. Ruske","doi":"10.1109/ASRU.2001.1034588","DOIUrl":null,"url":null,"abstract":"We propose a speaker clustering scheme working in 'eigenspace'. Speaker models are transformed to a low-dimensional subspace using 'eigenvoices'. For the speaker clustering procedure, simple distance measures, e.g. Euclidean distance, can be applied. Moreover, clustering can be accomplished with base models (for eigenvoice projection) like Gaussian mixture models as well as conventional HMMs. In case of HMMs, re-projection to the original space readily yields acoustic models. Clustering in subspace produces a well-balanced cluster and is easy to control. In the field of speaker adaptation, several principal techniques can be distinguished. The most prominent among them are Bayesian adaptation (e.g. MAP), transformation based approaches (MLLR - maximum likelihood linear regression), as well as so-called eigenspace techniques. Especially the latter have become increasingly popular, as they make use of a-priori information about the distribution of speaker models. The basic approach is commonly called the eigenvoice (EV) approach. Besides these techniques, speaker clustering is a further attractive adaptation scheme, especially since it can be - and has been - easily combined with the above methods.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"142 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":"{\"title\":\"Robust speaker clustering in eigenspace\",\"authors\":\"R. Faltlhauser, G. Ruske\",\"doi\":\"10.1109/ASRU.2001.1034588\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose a speaker clustering scheme working in 'eigenspace'. Speaker models are transformed to a low-dimensional subspace using 'eigenvoices'. For the speaker clustering procedure, simple distance measures, e.g. Euclidean distance, can be applied. Moreover, clustering can be accomplished with base models (for eigenvoice projection) like Gaussian mixture models as well as conventional HMMs. In case of HMMs, re-projection to the original space readily yields acoustic models. Clustering in subspace produces a well-balanced cluster and is easy to control. In the field of speaker adaptation, several principal techniques can be distinguished. The most prominent among them are Bayesian adaptation (e.g. MAP), transformation based approaches (MLLR - maximum likelihood linear regression), as well as so-called eigenspace techniques. Especially the latter have become increasingly popular, as they make use of a-priori information about the distribution of speaker models. The basic approach is commonly called the eigenvoice (EV) approach. Besides these techniques, speaker clustering is a further attractive adaptation scheme, especially since it can be - and has been - easily combined with the above methods.\",\"PeriodicalId\":118671,\"journal\":{\"name\":\"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.\",\"volume\":\"142 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2001-12-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"25\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU.2001.1034588\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2001.1034588","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 25

摘要

我们提出了一种在“特征空间”中工作的说话人聚类方案。使用“特征语音”将说话人模型转换为低维子空间。对于说话人聚类过程，可以使用简单的距离度量，例如欧几里得距离。此外，聚类可以用基本模型(用于特征语音投影)如高斯混合模型和传统hmm来完成。在hmm的情况下，重新投影到原始空间很容易产生声学模型。子空间中的聚类产生一个平衡良好的聚类，并且易于控制。在说话人自适应领域，可以区分出几种主要的技术。其中最突出的是贝叶斯自适应(例如MAP)，基于变换的方法(MLLR -最大似然线性回归)，以及所谓的特征空间技术。尤其是后者已经变得越来越流行，因为它们利用了关于说话人模型分布的先验信息。基本方法通常被称为特征语音(EV)方法。除了这些技术之外，说话人聚类是另一种有吸引力的自适应方案，特别是因为它可以很容易地与上述方法相结合。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Robust speaker clustering in eigenspace

We propose a speaker clustering scheme working in 'eigenspace'. Speaker models are transformed to a low-dimensional subspace using 'eigenvoices'. For the speaker clustering procedure, simple distance measures, e.g. Euclidean distance, can be applied. Moreover, clustering can be accomplished with base models (for eigenvoice projection) like Gaussian mixture models as well as conventional HMMs. In case of HMMs, re-projection to the original space readily yields acoustic models. Clustering in subspace produces a well-balanced cluster and is easy to control. In the field of speaker adaptation, several principal techniques can be distinguished. The most prominent among them are Bayesian adaptation (e.g. MAP), transformation based approaches (MLLR - maximum likelihood linear regression), as well as so-called eigenspace techniques. Especially the latter have become increasingly popular, as they make use of a-priori information about the distribution of speaker models. The basic approach is commonly called the eigenvoice (EV) approach. Besides these techniques, speaker clustering is a further attractive adaptation scheme, especially since it can be - and has been - easily combined with the above methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.

自引率

0.00%

发文量