基于密度的集成聚类成员生成

Workshop Proceedings of the 51st International Conference on Parallel Processing Pub Date : 2022-08-29 DOI:10.1145/3547276.3548520

Xueqin Du, Yulin He, Philippe Fournier-Viger, J. Huang

{"title":"基于密度的集成聚类成员生成","authors":"Xueqin Du, Yulin He, Philippe Fournier-Viger, J. Huang","doi":"10.1145/3547276.3548520","DOIUrl":null,"url":null,"abstract":"Ensemble clustering is a popular approach for identifying clusters in data, which combines results from multiple clustering algorithms to obtain more accurate and robust clusters. However, the performance of ensemble clustering algorithms greatly depends on the quality of its members. Based on this observation, this paper proposes a density-based member generation (DenMG) algorithm that selects ensemble members by considering the distribution consistency. DenMG has two main components, which split sample points from a heterocluster and merge sample points to form a homocluster, respectively. The first component estimates two probability density functions (p.d.f.s) based on an heterocluster’s sample points, and represents them using a Gaussian distribution and a Gaussian mixture model. If random numbers generated by these two p.d.f.s are deemed to have different probability distributions, the heterocluster is split into smaller clusters. The second component merges clusters that have high neighborhood densities into a homocluster. This is done using an opposite-oriented criterion that measures neighborhood density. A series of experiments were conducted to demonstrate the feasibility and effectiveness of the proposed ensemble member generation algorithm. Results show that the proposed algorithm can generate high quality ensemble members and as a result yield better clustering than five state-of-the-art ensemble clustering algorithms.","PeriodicalId":255540,"journal":{"name":"Workshop Proceedings of the 51st International Conference on Parallel Processing","volume":"238 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DenMG: Density-Based Member Generation for Ensemble Clustering\",\"authors\":\"Xueqin Du, Yulin He, Philippe Fournier-Viger, J. Huang\",\"doi\":\"10.1145/3547276.3548520\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Ensemble clustering is a popular approach for identifying clusters in data, which combines results from multiple clustering algorithms to obtain more accurate and robust clusters. However, the performance of ensemble clustering algorithms greatly depends on the quality of its members. Based on this observation, this paper proposes a density-based member generation (DenMG) algorithm that selects ensemble members by considering the distribution consistency. DenMG has two main components, which split sample points from a heterocluster and merge sample points to form a homocluster, respectively. The first component estimates two probability density functions (p.d.f.s) based on an heterocluster’s sample points, and represents them using a Gaussian distribution and a Gaussian mixture model. If random numbers generated by these two p.d.f.s are deemed to have different probability distributions, the heterocluster is split into smaller clusters. The second component merges clusters that have high neighborhood densities into a homocluster. This is done using an opposite-oriented criterion that measures neighborhood density. A series of experiments were conducted to demonstrate the feasibility and effectiveness of the proposed ensemble member generation algorithm. Results show that the proposed algorithm can generate high quality ensemble members and as a result yield better clustering than five state-of-the-art ensemble clustering algorithms.\",\"PeriodicalId\":255540,\"journal\":{\"name\":\"Workshop Proceedings of the 51st International Conference on Parallel Processing\",\"volume\":\"238 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Workshop Proceedings of the 51st International Conference on Parallel Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3547276.3548520\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop Proceedings of the 51st International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3547276.3548520","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

集成聚类是一种流行的数据聚类识别方法，它将多个聚类算法的结果结合在一起，以获得更准确和鲁棒的聚类。然而，集成聚类算法的性能在很大程度上取决于其成员的质量。基于此，本文提出了一种基于密度的成员生成(DenMG)算法，该算法通过考虑集合成员的分布一致性来选择集合成员。DenMG有两个主要组件，分别从异聚类中分离样本点和合并样本点形成同聚类。第一部分基于异质簇的样本点估计两个概率密度函数，并使用高斯分布和高斯混合模型表示它们。如果这两个p.d.f.产生的随机数被认为具有不同的概率分布，则异聚类被分成更小的聚类。第二个组件将具有高邻域密度的集群合并成一个同质集群。这是通过测量邻里密度的反向标准来完成的。通过一系列实验验证了所提出的集成成员生成算法的可行性和有效性。结果表明，该算法能够生成高质量的集成成员，聚类效果优于现有的5种集成聚类算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

DenMG: Density-Based Member Generation for Ensemble Clustering

Ensemble clustering is a popular approach for identifying clusters in data, which combines results from multiple clustering algorithms to obtain more accurate and robust clusters. However, the performance of ensemble clustering algorithms greatly depends on the quality of its members. Based on this observation, this paper proposes a density-based member generation (DenMG) algorithm that selects ensemble members by considering the distribution consistency. DenMG has two main components, which split sample points from a heterocluster and merge sample points to form a homocluster, respectively. The first component estimates two probability density functions (p.d.f.s) based on an heterocluster’s sample points, and represents them using a Gaussian distribution and a Gaussian mixture model. If random numbers generated by these two p.d.f.s are deemed to have different probability distributions, the heterocluster is split into smaller clusters. The second component merges clusters that have high neighborhood densities into a homocluster. This is done using an opposite-oriented criterion that measures neighborhood density. A series of experiments were conducted to demonstrate the feasibility and effectiveness of the proposed ensemble member generation algorithm. Results show that the proposed algorithm can generate high quality ensemble members and as a result yield better clustering than five state-of-the-art ensemble clustering algorithms.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Workshop Proceedings of the 51st International Conference on Parallel Processing

自引率

0.00%

发文量