An ensemble approach for generating partitional clusters from multiple cluster hierarchies

Mahmood Hossain, S. Bridges, Yong Wang, J. Hodges
{"title":"An ensemble approach for generating partitional clusters from multiple cluster hierarchies","authors":"Mahmood Hossain, S. Bridges, Yong Wang, J. Hodges","doi":"10.1109/GRC.2006.1635890","DOIUrl":null,"url":null,"abstract":"Traditional clustering is typically based on a single feature set. In some domains, several feature sets may be available to represent the same objects, but it may not be easy to compute an integrated feature set. We have developed the EPaCH (Ensemble method for generating Partitional clusters from multiple Cluster Hierarchies) algorithm to address the problem of combining the results of hierarchical clustering from multiple related datasets where the datasets represent the same set of objects but use different feature sets. EPaCH uses a graph theoretic approach to combine the hierarchies into a single set of partitional clusters. A graph is generated from the hierarchies based on the association strengths of objects in the hierarchies. A graph partitioning algorithm is then applied to generate flat clusters. EPaCH was tested empirically with a document collection consisting of journal abstracts from ten different Library of Congress categories. Both syntactic and semantic feature sets were extracted and the resulting datasets were clus- tered individually using average-link agglomerative hierarchical clustering. EPaCH was then used to generate a single set of flat clusters from the dendrograms. In the document clustering domain, EPaCH is shown to yield higher quality clusters than phylogeny-based ensemble methods and than clustering based on a single feature set for three of four measures of cluster quality.","PeriodicalId":400997,"journal":{"name":"2006 IEEE International Conference on Granular Computing","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 IEEE International Conference on Granular Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GRC.2006.1635890","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Traditional clustering is typically based on a single feature set. In some domains, several feature sets may be available to represent the same objects, but it may not be easy to compute an integrated feature set. We have developed the EPaCH (Ensemble method for generating Partitional clusters from multiple Cluster Hierarchies) algorithm to address the problem of combining the results of hierarchical clustering from multiple related datasets where the datasets represent the same set of objects but use different feature sets. EPaCH uses a graph theoretic approach to combine the hierarchies into a single set of partitional clusters. A graph is generated from the hierarchies based on the association strengths of objects in the hierarchies. A graph partitioning algorithm is then applied to generate flat clusters. EPaCH was tested empirically with a document collection consisting of journal abstracts from ten different Library of Congress categories. Both syntactic and semantic feature sets were extracted and the resulting datasets were clus- tered individually using average-link agglomerative hierarchical clustering. EPaCH was then used to generate a single set of flat clusters from the dendrograms. In the document clustering domain, EPaCH is shown to yield higher quality clusters than phylogeny-based ensemble methods and than clustering based on a single feature set for three of four measures of cluster quality.
从多个集群层次结构生成分区集群的集成方法
传统的聚类通常基于单个特性集。在某些领域,可以使用多个特征集来表示相同的对象,但是计算一个集成的特征集可能并不容易。我们开发了EPaCH(从多个聚类层次生成分区聚类的集成方法)算法,以解决组合来自多个相关数据集的分层聚类结果的问题,其中数据集表示相同的对象集,但使用不同的特征集。EPaCH使用图论方法将层次结构组合成一组分区集群。基于层次结构中对象的关联强度,从层次结构生成图。然后应用图划分算法生成平面聚类。EPaCH通过一个由国会图书馆十个不同类别的期刊摘要组成的文献集合进行了实证检验。提取句法和语义特征集,并使用平均链接聚类分层聚类对结果数据集进行单独聚类。然后使用EPaCH从树突图中生成一组平面簇。在文档聚类领域,EPaCH被证明比基于系统发育的集成方法产生更高质量的聚类,比基于单个特征集的聚类(四种聚类质量度量中的三种)产生更高质量的聚类。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信