{"title":"一种新的基于子空间的高维数据GMM聚类集成算法","authors":"Yulin He;Yingting He;Zhaowu Zhan;Fournier-Viger Philippe;Joshua Zhexue Huang","doi":"10.23919/cje.2023.00.153","DOIUrl":null,"url":null,"abstract":"The Gaussian mixture model (GMM) is a classical probabilistic representation model widely used in unsupervised learning. GMM performs poorly on high-dimensional data (HDD) due to the requirement of estimating a large number of parameters with relatively few observations. To address this, the paper proposes a novel subspace-based GMM clustering ensemble (SubGMM-CE) algorithm tailored for HDD. The proposed SubGMM-CE algorithm comprises three key components. A series of low-dimensional subspaces are dynamically determined, considering the optimal number of GMM components. The GMM-based clustering algorithm is applied to each subspace to obtain a series of heterogeneous GMM models. These GMM base clustering results are merged using the newly-designed relabeling strategy based on the average shared affiliation probability, generating the final clustering result for high-dimensional unlabeled data. An exhaustive experimental evaluation validates the feasibility, rationality, effectiveness, and robustness to noise of the SubGMM-CE algorithm. Results show that SubGMM-CE achieves higher stability and more accurate clustering results, outperforming nine state-of-the-art clustering algorithms in normalized mutual information, clustering accuracy, and adjusted rand index scores. This demonstrates the viability of the SubGMM-CE algorithm in addressing HDD clustering challenges.","PeriodicalId":50701,"journal":{"name":"Chinese Journal of Electronics","volume":"34 2","pages":"612-629"},"PeriodicalIF":1.6000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10982075","citationCount":"0","resultStr":"{\"title\":\"A Novel Subspace-Based GMM Clustering Ensemble Algorithm for High-Dimensional Data\",\"authors\":\"Yulin He;Yingting He;Zhaowu Zhan;Fournier-Viger Philippe;Joshua Zhexue Huang\",\"doi\":\"10.23919/cje.2023.00.153\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Gaussian mixture model (GMM) is a classical probabilistic representation model widely used in unsupervised learning. GMM performs poorly on high-dimensional data (HDD) due to the requirement of estimating a large number of parameters with relatively few observations. To address this, the paper proposes a novel subspace-based GMM clustering ensemble (SubGMM-CE) algorithm tailored for HDD. The proposed SubGMM-CE algorithm comprises three key components. A series of low-dimensional subspaces are dynamically determined, considering the optimal number of GMM components. The GMM-based clustering algorithm is applied to each subspace to obtain a series of heterogeneous GMM models. These GMM base clustering results are merged using the newly-designed relabeling strategy based on the average shared affiliation probability, generating the final clustering result for high-dimensional unlabeled data. An exhaustive experimental evaluation validates the feasibility, rationality, effectiveness, and robustness to noise of the SubGMM-CE algorithm. Results show that SubGMM-CE achieves higher stability and more accurate clustering results, outperforming nine state-of-the-art clustering algorithms in normalized mutual information, clustering accuracy, and adjusted rand index scores. This demonstrates the viability of the SubGMM-CE algorithm in addressing HDD clustering challenges.\",\"PeriodicalId\":50701,\"journal\":{\"name\":\"Chinese Journal of Electronics\",\"volume\":\"34 2\",\"pages\":\"612-629\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2025-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10982075\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Chinese Journal of Electronics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10982075/\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chinese Journal of Electronics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10982075/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
A Novel Subspace-Based GMM Clustering Ensemble Algorithm for High-Dimensional Data
The Gaussian mixture model (GMM) is a classical probabilistic representation model widely used in unsupervised learning. GMM performs poorly on high-dimensional data (HDD) due to the requirement of estimating a large number of parameters with relatively few observations. To address this, the paper proposes a novel subspace-based GMM clustering ensemble (SubGMM-CE) algorithm tailored for HDD. The proposed SubGMM-CE algorithm comprises three key components. A series of low-dimensional subspaces are dynamically determined, considering the optimal number of GMM components. The GMM-based clustering algorithm is applied to each subspace to obtain a series of heterogeneous GMM models. These GMM base clustering results are merged using the newly-designed relabeling strategy based on the average shared affiliation probability, generating the final clustering result for high-dimensional unlabeled data. An exhaustive experimental evaluation validates the feasibility, rationality, effectiveness, and robustness to noise of the SubGMM-CE algorithm. Results show that SubGMM-CE achieves higher stability and more accurate clustering results, outperforming nine state-of-the-art clustering algorithms in normalized mutual information, clustering accuracy, and adjusted rand index scores. This demonstrates the viability of the SubGMM-CE algorithm in addressing HDD clustering challenges.
期刊介绍:
CJE focuses on the emerging fields of electronics, publishing innovative and transformative research papers. Most of the papers published in CJE are from universities and research institutes, presenting their innovative research results. Both theoretical and practical contributions are encouraged, and original research papers reporting novel solutions to the hot topics in electronics are strongly recommended.