Clustering Ensemble Approach Based on Incremental Learning

S. Khedairia, Imene Houari, Manel Ababsia, Tarek Khadir
{"title":"Clustering Ensemble Approach Based on Incremental Learning","authors":"S. Khedairia, Imene Houari, Manel Ababsia, Tarek Khadir","doi":"10.1145/3361570.3361603","DOIUrl":null,"url":null,"abstract":"The clustering ensemble aims to combine multiple clustering results into a probably better and more robust consensus clustering. This technique has shown its efficiency in finding bizarre clusters, dealing with noise, and integrating clustering solutions from multiple distributed sources. Consensus clustering methods based on voting mechanism are widely used in literature. The idea behind majority voting is that the judgement of a group is superior to those of individuals. However, Voting-based consensus methods suffer from the problem of assigning the appropriate cluster label to data objects without majority vote. To deal with this ambiguity as well as clustering when datasets are too large or when new information can arrive dynamically at any time, we have proposed a new clustering approach based on two stage clustering technique where in the first stage a clustering ensemble method based on relabeling and voting process is used to cluster the data objects. Therefore, a new set of disjoint sub-clusters is generated based on majority vote, where each data object vote for the cluster in which it belongs and for its corresponding cluster in each other clustering results. data objects without majority vote are collected in new dataset. In the second stage, the new database as well as the set of previously obtained sub-clusters are processed using an incremental clustering algorithm. The used incremental clustering algorithm is initialized using the obtained sub-clusters and operate on the new dataset elements. The main advantage of incremental clustering methods is that the system can updates its assumptions based on recently available learning data without re-examining old data. The proposed approach have been evaluated using different datasets, where the experimental results have demonstrated the effectiveness and robustness of the proposed method.","PeriodicalId":414028,"journal":{"name":"Proceedings of the 9th International Conference on Information Systems and Technologies","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 9th International Conference on Information Systems and Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3361570.3361603","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The clustering ensemble aims to combine multiple clustering results into a probably better and more robust consensus clustering. This technique has shown its efficiency in finding bizarre clusters, dealing with noise, and integrating clustering solutions from multiple distributed sources. Consensus clustering methods based on voting mechanism are widely used in literature. The idea behind majority voting is that the judgement of a group is superior to those of individuals. However, Voting-based consensus methods suffer from the problem of assigning the appropriate cluster label to data objects without majority vote. To deal with this ambiguity as well as clustering when datasets are too large or when new information can arrive dynamically at any time, we have proposed a new clustering approach based on two stage clustering technique where in the first stage a clustering ensemble method based on relabeling and voting process is used to cluster the data objects. Therefore, a new set of disjoint sub-clusters is generated based on majority vote, where each data object vote for the cluster in which it belongs and for its corresponding cluster in each other clustering results. data objects without majority vote are collected in new dataset. In the second stage, the new database as well as the set of previously obtained sub-clusters are processed using an incremental clustering algorithm. The used incremental clustering algorithm is initialized using the obtained sub-clusters and operate on the new dataset elements. The main advantage of incremental clustering methods is that the system can updates its assumptions based on recently available learning data without re-examining old data. The proposed approach have been evaluated using different datasets, where the experimental results have demonstrated the effectiveness and robustness of the proposed method.
基于增量学习的聚类集成方法
聚类集成旨在将多个聚类结果组合成一个可能更好、更健壮的共识聚类。该技术在发现奇异聚类、处理噪声和集成来自多个分布式源的聚类解决方案方面显示出其效率。基于投票机制的共识聚类方法在文献中得到了广泛的应用。多数投票背后的理念是,群体的判断优于个人的判断。然而,基于投票的共识方法存在一个问题,即在没有多数投票的情况下为数据对象分配适当的聚类标签。为了处理这种模糊性以及当数据集太大或新信息随时动态到达时的聚类问题,我们提出了一种基于两阶段聚类技术的聚类方法,在第一阶段使用基于重新标记和投票过程的聚类集成方法对数据对象进行聚类。因此,基于多数投票生成一组新的不相交的子集群,其中每个数据对象在其他聚类结果中为其所属的集群和其对应的集群投票。没有多数投票的数据对象被收集到新的数据集中。在第二阶段,使用增量聚类算法处理新数据库以及先前获得的子聚类集。所使用的增量聚类算法使用获得的子聚类进行初始化,并对新的数据集元素进行操作。增量聚类方法的主要优点是系统可以根据最近可用的学习数据更新其假设,而无需重新检查旧数据。使用不同的数据集对所提出的方法进行了评估,其中实验结果证明了所提出方法的有效性和鲁棒性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信