{"title":"An ensemble approach for generating partitional clusters from multiple cluster hierarchies","authors":"Mahmood Hossain, S. Bridges, Yong Wang, J. Hodges","doi":"10.1109/GRC.2006.1635890","DOIUrl":null,"url":null,"abstract":"Traditional clustering is typically based on a single feature set. In some domains, several feature sets may be available to represent the same objects, but it may not be easy to compute an integrated feature set. We have developed the EPaCH (Ensemble method for generating Partitional clusters from multiple Cluster Hierarchies) algorithm to address the problem of combining the results of hierarchical clustering from multiple related datasets where the datasets represent the same set of objects but use different feature sets. EPaCH uses a graph theoretic approach to combine the hierarchies into a single set of partitional clusters. A graph is generated from the hierarchies based on the association strengths of objects in the hierarchies. A graph partitioning algorithm is then applied to generate flat clusters. EPaCH was tested empirically with a document collection consisting of journal abstracts from ten different Library of Congress categories. Both syntactic and semantic feature sets were extracted and the resulting datasets were clus- tered individually using average-link agglomerative hierarchical clustering. EPaCH was then used to generate a single set of flat clusters from the dendrograms. In the document clustering domain, EPaCH is shown to yield higher quality clusters than phylogeny-based ensemble methods and than clustering based on a single feature set for three of four measures of cluster quality.","PeriodicalId":400997,"journal":{"name":"2006 IEEE International Conference on Granular Computing","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 IEEE International Conference on Granular Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GRC.2006.1635890","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Traditional clustering is typically based on a single feature set. In some domains, several feature sets may be available to represent the same objects, but it may not be easy to compute an integrated feature set. We have developed the EPaCH (Ensemble method for generating Partitional clusters from multiple Cluster Hierarchies) algorithm to address the problem of combining the results of hierarchical clustering from multiple related datasets where the datasets represent the same set of objects but use different feature sets. EPaCH uses a graph theoretic approach to combine the hierarchies into a single set of partitional clusters. A graph is generated from the hierarchies based on the association strengths of objects in the hierarchies. A graph partitioning algorithm is then applied to generate flat clusters. EPaCH was tested empirically with a document collection consisting of journal abstracts from ten different Library of Congress categories. Both syntactic and semantic feature sets were extracted and the resulting datasets were clus- tered individually using average-link agglomerative hierarchical clustering. EPaCH was then used to generate a single set of flat clusters from the dendrograms. In the document clustering domain, EPaCH is shown to yield higher quality clusters than phylogeny-based ensemble methods and than clustering based on a single feature set for three of four measures of cluster quality.