Dynamic speaker clustering algorithm based on minimal GMM distance tracing

Jun He, Qian-Hua He, Zhifeng Wang, Yan-xiong Li, Hai-Yu Luo
{"title":"Dynamic speaker clustering algorithm based on minimal GMM distance tracing","authors":"Jun He, Qian-Hua He, Zhifeng Wang, Yan-xiong Li, Hai-Yu Luo","doi":"10.1109/CYBER.2011.6011763","DOIUrl":null,"url":null,"abstract":"In the field of speaker clustering, most of the clustering algorithms rely heavily on the pre-given thresholds, which is hard work to get the optimal values. This paper proposed a speaker clustering algorithm based on tracing the minimal Bhattacharyya distance between two Gaussian Mixture Models (GMMs), without any pre-given thresholds. In the procedure of clustering, if utterance set A and B has the minimal distance, utterance B is regarded as suspicious set whose utterance may come from the speaker of A. And then, two stage-verification is used. First, a comparative likelihood is used to verify whether the suspicious set B is generated from the speaker or not. Second, a comparative likelihood for each utterance in set B is used to judge whether it is produced by the speaker of set A or not. If the utterance is from the speaker of set A, we move the utterance of set B to set A. And then the models of utterance set A and B are updated. Repeat the above two stages until each speech set is not changed. Experiments, evaluated on Chinese 863 speech database, give 68.97% average cluster purity (ACP), and classification error ratio (CER) is 39%. On the other hand, CER of the K-means and the Iterative Self-Organizing Data Analysis (ISODATA) with the optimal thresholds give 35% and 38% respectively.","PeriodicalId":131682,"journal":{"name":"2011 IEEE International Conference on Cyber Technology in Automation, Control, and Intelligent Systems","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE International Conference on Cyber Technology in Automation, Control, and Intelligent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CYBER.2011.6011763","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In the field of speaker clustering, most of the clustering algorithms rely heavily on the pre-given thresholds, which is hard work to get the optimal values. This paper proposed a speaker clustering algorithm based on tracing the minimal Bhattacharyya distance between two Gaussian Mixture Models (GMMs), without any pre-given thresholds. In the procedure of clustering, if utterance set A and B has the minimal distance, utterance B is regarded as suspicious set whose utterance may come from the speaker of A. And then, two stage-verification is used. First, a comparative likelihood is used to verify whether the suspicious set B is generated from the speaker or not. Second, a comparative likelihood for each utterance in set B is used to judge whether it is produced by the speaker of set A or not. If the utterance is from the speaker of set A, we move the utterance of set B to set A. And then the models of utterance set A and B are updated. Repeat the above two stages until each speech set is not changed. Experiments, evaluated on Chinese 863 speech database, give 68.97% average cluster purity (ACP), and classification error ratio (CER) is 39%. On the other hand, CER of the K-means and the Iterative Self-Organizing Data Analysis (ISODATA) with the optimal thresholds give 35% and 38% respectively.
基于最小GMM距离跟踪的动态说话人聚类算法
在说话人聚类领域,大多数聚类算法严重依赖于预先给定的阈值,很难得到最优值。本文提出了一种基于跟踪两个高斯混合模型之间的最小Bhattacharyya距离的说话人聚类算法,该算法不需要预先给定阈值。在聚类过程中,如果话语集A与B的距离最小,则将话语集B视为可疑集,其话语可能来自A的说话人,然后使用两阶段验证。首先,使用比较似然来验证可疑集合B是否来自说话者。其次,使用集合B中每个话语的比较似然来判断它是否由集合a的说话人产生。如果话语来自集合A的说话人,我们将集合B的话语移动到集合A,然后更新集合A和B的话语模型。重复以上两个步骤,直到每个演讲集不变。在中文863语音数据库上进行实验,平均聚类纯度(ACP)为68.97%,分类错误率(CER)为39%。另一方面,具有最优阈值的K-means和迭代自组织数据分析(ISODATA)的CER分别为35%和38%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信