Clustering method using hypergraph models based on Set Pair Analysis

2009 IEEE International Symposium on IT in Medicine & Education Pub Date : 2009-09-15 DOI:10.1109/ITIME.2009.5236279

Guo-ping Lin, Shao-Zi Li

引用次数: 0

Abstract

Text clustering methods can be used to structure large sets of text or hypertext documents. However, a lot of well-known methods for text clustering do not really address the special problems of text clustering: very high dimensionality of the data and understandability of the cluster description. In this paper, we introduce a novel approach which is based on the hypergraph model of text clustering by using Set Pair Analysis (SPA) that is a new methodology to describe and process system uncertainty. In this method, we define a new measure for text similarity by the identical, different, and contrary of Set Pair. After setting up the hypergraph model, a hypergraph partitioning algorithm will be used to find clusters. The new method can eliminate disadvantageous factors and decreases the textual dimension of text and enhances the speed and accuracy of the text clustering. The experiment demonstrates that our approach is applicable and effective in high dimensional textual datasets.

查看原文本刊更多论文

基于集对分析的超图模型聚类方法

文本聚类方法可用于构建大型文本集或超文本文档。然而，许多众所周知的文本聚类方法并没有真正解决文本聚类的特殊问题:数据的高维度和聚类描述的可理解性。本文提出了一种基于文本聚类的超图模型，利用集对分析(SPA)来描述和处理系统不确定性的新方法。在该方法中，我们通过集合对的相同、不同和相反定义了一种新的文本相似度度量。建立超图模型后，将使用超图划分算法来查找聚类。该方法消除了不利因素，降低了文本的文本维数，提高了文本聚类的速度和准确性。实验结果表明，该方法在高维文本数据集上是有效的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2009 IEEE International Symposium on IT in Medicine & Education

自引率

0.00%

发文量