基于集对分析的超图模型聚类方法

Guo-ping Lin, Shao-Zi Li
{"title":"基于集对分析的超图模型聚类方法","authors":"Guo-ping Lin, Shao-Zi Li","doi":"10.1109/ITIME.2009.5236279","DOIUrl":null,"url":null,"abstract":"Text clustering methods can be used to structure large sets of text or hypertext documents. However, a lot of well-known methods for text clustering do not really address the special problems of text clustering: very high dimensionality of the data and understandability of the cluster description. In this paper, we introduce a novel approach which is based on the hypergraph model of text clustering by using Set Pair Analysis (SPA) that is a new methodology to describe and process system uncertainty. In this method, we define a new measure for text similarity by the identical, different, and contrary of Set Pair. After setting up the hypergraph model, a hypergraph partitioning algorithm will be used to find clusters. The new method can eliminate disadvantageous factors and decreases the textual dimension of text and enhances the speed and accuracy of the text clustering. The experiment demonstrates that our approach is applicable and effective in high dimensional textual datasets.","PeriodicalId":398477,"journal":{"name":"2009 IEEE International Symposium on IT in Medicine & Education","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Clustering method using hypergraph models based on Set Pair Analysis\",\"authors\":\"Guo-ping Lin, Shao-Zi Li\",\"doi\":\"10.1109/ITIME.2009.5236279\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text clustering methods can be used to structure large sets of text or hypertext documents. However, a lot of well-known methods for text clustering do not really address the special problems of text clustering: very high dimensionality of the data and understandability of the cluster description. In this paper, we introduce a novel approach which is based on the hypergraph model of text clustering by using Set Pair Analysis (SPA) that is a new methodology to describe and process system uncertainty. In this method, we define a new measure for text similarity by the identical, different, and contrary of Set Pair. After setting up the hypergraph model, a hypergraph partitioning algorithm will be used to find clusters. The new method can eliminate disadvantageous factors and decreases the textual dimension of text and enhances the speed and accuracy of the text clustering. The experiment demonstrates that our approach is applicable and effective in high dimensional textual datasets.\",\"PeriodicalId\":398477,\"journal\":{\"name\":\"2009 IEEE International Symposium on IT in Medicine & Education\",\"volume\":\"50 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-09-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 IEEE International Symposium on IT in Medicine & Education\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ITIME.2009.5236279\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE International Symposium on IT in Medicine & Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITIME.2009.5236279","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

文本聚类方法可用于构建大型文本集或超文本文档。然而,许多众所周知的文本聚类方法并没有真正解决文本聚类的特殊问题:数据的高维度和聚类描述的可理解性。本文提出了一种基于文本聚类的超图模型,利用集对分析(SPA)来描述和处理系统不确定性的新方法。在该方法中,我们通过集合对的相同、不同和相反定义了一种新的文本相似度度量。建立超图模型后,将使用超图划分算法来查找聚类。该方法消除了不利因素,降低了文本的文本维数,提高了文本聚类的速度和准确性。实验结果表明,该方法在高维文本数据集上是有效的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Clustering method using hypergraph models based on Set Pair Analysis
Text clustering methods can be used to structure large sets of text or hypertext documents. However, a lot of well-known methods for text clustering do not really address the special problems of text clustering: very high dimensionality of the data and understandability of the cluster description. In this paper, we introduce a novel approach which is based on the hypergraph model of text clustering by using Set Pair Analysis (SPA) that is a new methodology to describe and process system uncertainty. In this method, we define a new measure for text similarity by the identical, different, and contrary of Set Pair. After setting up the hypergraph model, a hypergraph partitioning algorithm will be used to find clusters. The new method can eliminate disadvantageous factors and decreases the textual dimension of text and enhances the speed and accuracy of the text clustering. The experiment demonstrates that our approach is applicable and effective in high dimensional textual datasets.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信