{"title":"Leveraging frequency and diversity based ensemble selection to consensus clustering","authors":"Arko Banerjee","doi":"10.1109/IC3.2014.6897160","DOIUrl":null,"url":null,"abstract":"Consensus clustering, also called aggregation of clustering (or partitions) is a method that desires to improve the robustness and quality in clustering of a dataset by optimally reconciling the results of different clusterings of the same dataset generated in different ways. This paper proposes a novel way of arriving at a consensus clustering by an ensemble selection strategy. The method avoids considering the entire ensemble and judiciously select few clusterings in the ensemble without compromising on the quality of the consensus. It begins with sorting the ensemble by prioritizing clusterings based on diversity and frequency. It is observed that considering jointly the diversity and frequency helps in identifying few representative partitions that have high potentiality to form qualitatively better consensus than that of the entire ensemble. Finally a greedy strategy is used to select the clusterings in an iterative consensus generation technique that ensures the internal quality of clustering to be monotonically non-decreasing. Empirical results show that the consensus clustering obtained by the proposed algorithm gives better clustering accuracy for many datasets.","PeriodicalId":444918,"journal":{"name":"2014 Seventh International Conference on Contemporary Computing (IC3)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 Seventh International Conference on Contemporary Computing (IC3)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC3.2014.6897160","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Consensus clustering, also called aggregation of clustering (or partitions) is a method that desires to improve the robustness and quality in clustering of a dataset by optimally reconciling the results of different clusterings of the same dataset generated in different ways. This paper proposes a novel way of arriving at a consensus clustering by an ensemble selection strategy. The method avoids considering the entire ensemble and judiciously select few clusterings in the ensemble without compromising on the quality of the consensus. It begins with sorting the ensemble by prioritizing clusterings based on diversity and frequency. It is observed that considering jointly the diversity and frequency helps in identifying few representative partitions that have high potentiality to form qualitatively better consensus than that of the entire ensemble. Finally a greedy strategy is used to select the clusterings in an iterative consensus generation technique that ensures the internal quality of clustering to be monotonically non-decreasing. Empirical results show that the consensus clustering obtained by the proposed algorithm gives better clustering accuracy for many datasets.