{"title":"Clustering method using hypergraph models based on Set Pair Analysis","authors":"Guo-ping Lin, Shao-Zi Li","doi":"10.1109/ITIME.2009.5236279","DOIUrl":null,"url":null,"abstract":"Text clustering methods can be used to structure large sets of text or hypertext documents. However, a lot of well-known methods for text clustering do not really address the special problems of text clustering: very high dimensionality of the data and understandability of the cluster description. In this paper, we introduce a novel approach which is based on the hypergraph model of text clustering by using Set Pair Analysis (SPA) that is a new methodology to describe and process system uncertainty. In this method, we define a new measure for text similarity by the identical, different, and contrary of Set Pair. After setting up the hypergraph model, a hypergraph partitioning algorithm will be used to find clusters. The new method can eliminate disadvantageous factors and decreases the textual dimension of text and enhances the speed and accuracy of the text clustering. The experiment demonstrates that our approach is applicable and effective in high dimensional textual datasets.","PeriodicalId":398477,"journal":{"name":"2009 IEEE International Symposium on IT in Medicine & Education","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE International Symposium on IT in Medicine & Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITIME.2009.5236279","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Text clustering methods can be used to structure large sets of text or hypertext documents. However, a lot of well-known methods for text clustering do not really address the special problems of text clustering: very high dimensionality of the data and understandability of the cluster description. In this paper, we introduce a novel approach which is based on the hypergraph model of text clustering by using Set Pair Analysis (SPA) that is a new methodology to describe and process system uncertainty. In this method, we define a new measure for text similarity by the identical, different, and contrary of Set Pair. After setting up the hypergraph model, a hypergraph partitioning algorithm will be used to find clusters. The new method can eliminate disadvantageous factors and decreases the textual dimension of text and enhances the speed and accuracy of the text clustering. The experiment demonstrates that our approach is applicable and effective in high dimensional textual datasets.