Jun He, Qian-Hua He, Zhifeng Wang, Yan-xiong Li, Hai-Yu Luo
{"title":"基于最小GMM距离跟踪的动态说话人聚类算法","authors":"Jun He, Qian-Hua He, Zhifeng Wang, Yan-xiong Li, Hai-Yu Luo","doi":"10.1109/CYBER.2011.6011763","DOIUrl":null,"url":null,"abstract":"In the field of speaker clustering, most of the clustering algorithms rely heavily on the pre-given thresholds, which is hard work to get the optimal values. This paper proposed a speaker clustering algorithm based on tracing the minimal Bhattacharyya distance between two Gaussian Mixture Models (GMMs), without any pre-given thresholds. In the procedure of clustering, if utterance set A and B has the minimal distance, utterance B is regarded as suspicious set whose utterance may come from the speaker of A. And then, two stage-verification is used. First, a comparative likelihood is used to verify whether the suspicious set B is generated from the speaker or not. Second, a comparative likelihood for each utterance in set B is used to judge whether it is produced by the speaker of set A or not. If the utterance is from the speaker of set A, we move the utterance of set B to set A. And then the models of utterance set A and B are updated. Repeat the above two stages until each speech set is not changed. Experiments, evaluated on Chinese 863 speech database, give 68.97% average cluster purity (ACP), and classification error ratio (CER) is 39%. On the other hand, CER of the K-means and the Iterative Self-Organizing Data Analysis (ISODATA) with the optimal thresholds give 35% and 38% respectively.","PeriodicalId":131682,"journal":{"name":"2011 IEEE International Conference on Cyber Technology in Automation, Control, and Intelligent Systems","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Dynamic speaker clustering algorithm based on minimal GMM distance tracing\",\"authors\":\"Jun He, Qian-Hua He, Zhifeng Wang, Yan-xiong Li, Hai-Yu Luo\",\"doi\":\"10.1109/CYBER.2011.6011763\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the field of speaker clustering, most of the clustering algorithms rely heavily on the pre-given thresholds, which is hard work to get the optimal values. This paper proposed a speaker clustering algorithm based on tracing the minimal Bhattacharyya distance between two Gaussian Mixture Models (GMMs), without any pre-given thresholds. In the procedure of clustering, if utterance set A and B has the minimal distance, utterance B is regarded as suspicious set whose utterance may come from the speaker of A. And then, two stage-verification is used. First, a comparative likelihood is used to verify whether the suspicious set B is generated from the speaker or not. Second, a comparative likelihood for each utterance in set B is used to judge whether it is produced by the speaker of set A or not. If the utterance is from the speaker of set A, we move the utterance of set B to set A. And then the models of utterance set A and B are updated. Repeat the above two stages until each speech set is not changed. Experiments, evaluated on Chinese 863 speech database, give 68.97% average cluster purity (ACP), and classification error ratio (CER) is 39%. On the other hand, CER of the K-means and the Iterative Self-Organizing Data Analysis (ISODATA) with the optimal thresholds give 35% and 38% respectively.\",\"PeriodicalId\":131682,\"journal\":{\"name\":\"2011 IEEE International Conference on Cyber Technology in Automation, Control, and Intelligent Systems\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-03-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 IEEE International Conference on Cyber Technology in Automation, Control, and Intelligent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CYBER.2011.6011763\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE International Conference on Cyber Technology in Automation, Control, and Intelligent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CYBER.2011.6011763","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Dynamic speaker clustering algorithm based on minimal GMM distance tracing
In the field of speaker clustering, most of the clustering algorithms rely heavily on the pre-given thresholds, which is hard work to get the optimal values. This paper proposed a speaker clustering algorithm based on tracing the minimal Bhattacharyya distance between two Gaussian Mixture Models (GMMs), without any pre-given thresholds. In the procedure of clustering, if utterance set A and B has the minimal distance, utterance B is regarded as suspicious set whose utterance may come from the speaker of A. And then, two stage-verification is used. First, a comparative likelihood is used to verify whether the suspicious set B is generated from the speaker or not. Second, a comparative likelihood for each utterance in set B is used to judge whether it is produced by the speaker of set A or not. If the utterance is from the speaker of set A, we move the utterance of set B to set A. And then the models of utterance set A and B are updated. Repeat the above two stages until each speech set is not changed. Experiments, evaluated on Chinese 863 speech database, give 68.97% average cluster purity (ACP), and classification error ratio (CER) is 39%. On the other hand, CER of the K-means and the Iterative Self-Organizing Data Analysis (ISODATA) with the optimal thresholds give 35% and 38% respectively.