基于最小GMM距离跟踪的动态说话人聚类算法

2011 IEEE International Conference on Cyber Technology in Automation, Control, and Intelligent Systems Pub Date : 2011-03-20 DOI:10.1109/CYBER.2011.6011763

Jun He, Qian-Hua He, Zhifeng Wang, Yan-xiong Li, Hai-Yu Luo

{"title":"基于最小GMM距离跟踪的动态说话人聚类算法","authors":"Jun He, Qian-Hua He, Zhifeng Wang, Yan-xiong Li, Hai-Yu Luo","doi":"10.1109/CYBER.2011.6011763","DOIUrl":null,"url":null,"abstract":"In the field of speaker clustering, most of the clustering algorithms rely heavily on the pre-given thresholds, which is hard work to get the optimal values. This paper proposed a speaker clustering algorithm based on tracing the minimal Bhattacharyya distance between two Gaussian Mixture Models (GMMs), without any pre-given thresholds. In the procedure of clustering, if utterance set A and B has the minimal distance, utterance B is regarded as suspicious set whose utterance may come from the speaker of A. And then, two stage-verification is used. First, a comparative likelihood is used to verify whether the suspicious set B is generated from the speaker or not. Second, a comparative likelihood for each utterance in set B is used to judge whether it is produced by the speaker of set A or not. If the utterance is from the speaker of set A, we move the utterance of set B to set A. And then the models of utterance set A and B are updated. Repeat the above two stages until each speech set is not changed. Experiments, evaluated on Chinese 863 speech database, give 68.97% average cluster purity (ACP), and classification error ratio (CER) is 39%. On the other hand, CER of the K-means and the Iterative Self-Organizing Data Analysis (ISODATA) with the optimal thresholds give 35% and 38% respectively.","PeriodicalId":131682,"journal":{"name":"2011 IEEE International Conference on Cyber Technology in Automation, Control, and Intelligent Systems","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Dynamic speaker clustering algorithm based on minimal GMM distance tracing\",\"authors\":\"Jun He, Qian-Hua He, Zhifeng Wang, Yan-xiong Li, Hai-Yu Luo\",\"doi\":\"10.1109/CYBER.2011.6011763\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the field of speaker clustering, most of the clustering algorithms rely heavily on the pre-given thresholds, which is hard work to get the optimal values. This paper proposed a speaker clustering algorithm based on tracing the minimal Bhattacharyya distance between two Gaussian Mixture Models (GMMs), without any pre-given thresholds. In the procedure of clustering, if utterance set A and B has the minimal distance, utterance B is regarded as suspicious set whose utterance may come from the speaker of A. And then, two stage-verification is used. First, a comparative likelihood is used to verify whether the suspicious set B is generated from the speaker or not. Second, a comparative likelihood for each utterance in set B is used to judge whether it is produced by the speaker of set A or not. If the utterance is from the speaker of set A, we move the utterance of set B to set A. And then the models of utterance set A and B are updated. Repeat the above two stages until each speech set is not changed. Experiments, evaluated on Chinese 863 speech database, give 68.97% average cluster purity (ACP), and classification error ratio (CER) is 39%. On the other hand, CER of the K-means and the Iterative Self-Organizing Data Analysis (ISODATA) with the optimal thresholds give 35% and 38% respectively.\",\"PeriodicalId\":131682,\"journal\":{\"name\":\"2011 IEEE International Conference on Cyber Technology in Automation, Control, and Intelligent Systems\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-03-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 IEEE International Conference on Cyber Technology in Automation, Control, and Intelligent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CYBER.2011.6011763\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE International Conference on Cyber Technology in Automation, Control, and Intelligent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CYBER.2011.6011763","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在说话人聚类领域，大多数聚类算法严重依赖于预先给定的阈值，很难得到最优值。本文提出了一种基于跟踪两个高斯混合模型之间的最小Bhattacharyya距离的说话人聚类算法，该算法不需要预先给定阈值。在聚类过程中，如果话语集A与B的距离最小，则将话语集B视为可疑集，其话语可能来自A的说话人，然后使用两阶段验证。首先，使用比较似然来验证可疑集合B是否来自说话者。其次，使用集合B中每个话语的比较似然来判断它是否由集合a的说话人产生。如果话语来自集合A的说话人，我们将集合B的话语移动到集合A，然后更新集合A和B的话语模型。重复以上两个步骤，直到每个演讲集不变。在中文863语音数据库上进行实验，平均聚类纯度(ACP)为68.97%，分类错误率(CER)为39%。另一方面，具有最优阈值的K-means和迭代自组织数据分析(ISODATA)的CER分别为35%和38%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Dynamic speaker clustering algorithm based on minimal GMM distance tracing

In the field of speaker clustering, most of the clustering algorithms rely heavily on the pre-given thresholds, which is hard work to get the optimal values. This paper proposed a speaker clustering algorithm based on tracing the minimal Bhattacharyya distance between two Gaussian Mixture Models (GMMs), without any pre-given thresholds. In the procedure of clustering, if utterance set A and B has the minimal distance, utterance B is regarded as suspicious set whose utterance may come from the speaker of A. And then, two stage-verification is used. First, a comparative likelihood is used to verify whether the suspicious set B is generated from the speaker or not. Second, a comparative likelihood for each utterance in set B is used to judge whether it is produced by the speaker of set A or not. If the utterance is from the speaker of set A, we move the utterance of set B to set A. And then the models of utterance set A and B are updated. Repeat the above two stages until each speech set is not changed. Experiments, evaluated on Chinese 863 speech database, give 68.97% average cluster purity (ACP), and classification error ratio (CER) is 39%. On the other hand, CER of the K-means and the Iterative Self-Organizing Data Analysis (ISODATA) with the optimal thresholds give 35% and 38% respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 IEEE International Conference on Cyber Technology in Automation, Control, and Intelligent Systems

自引率

0.00%

发文量