频繁基于术语的点对点文本聚类

2010 Third International Symposium on Knowledge Acquisition and Modeling Pub Date : 2010-11-29 DOI:10.1109/KAM.2010.5646177

Qing He, Tingting Li, Fuzhen Zhuang, Zhongzhi Shi

{"title":"频繁基于术语的点对点文本聚类","authors":"Qing He, Tingting Li, Fuzhen Zhuang, Zhongzhi Shi","doi":"10.1109/KAM.2010.5646177","DOIUrl":null,"url":null,"abstract":"Text clustering is an important technology for automatically structuring large document collections. It is much more valuable in peer-to-peer networks. The high dimensionality of documents means much more communication could be saved if each node could get the approximate clustering result by distributed algorithm instead of transferring them into a center and do the clustering. Most of the existing text clustering algorithms in unstructured peer-to-peer networks are based on K-means algorithm. A problem of those algorithms is that the clustering quality may decreased with the increase of the network size. In this paper, we propose a text clustering algorithm based on frequent term sets for peer-to-peer networks. It requires relatively lower communication volume while achieving a clustering result whose quality will not be affected by the size of the network. Moreover, it gives a term set describing each cluster, which makes it possible for people to have a clear comprehension for the clustering result, and facilitates the users to find resource in the network or manage the local documents in accordance with the whole network.","PeriodicalId":160788,"journal":{"name":"2010 Third International Symposium on Knowledge Acquisition and Modeling","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Frequent term based peer-to-peer text clustering\",\"authors\":\"Qing He, Tingting Li, Fuzhen Zhuang, Zhongzhi Shi\",\"doi\":\"10.1109/KAM.2010.5646177\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text clustering is an important technology for automatically structuring large document collections. It is much more valuable in peer-to-peer networks. The high dimensionality of documents means much more communication could be saved if each node could get the approximate clustering result by distributed algorithm instead of transferring them into a center and do the clustering. Most of the existing text clustering algorithms in unstructured peer-to-peer networks are based on K-means algorithm. A problem of those algorithms is that the clustering quality may decreased with the increase of the network size. In this paper, we propose a text clustering algorithm based on frequent term sets for peer-to-peer networks. It requires relatively lower communication volume while achieving a clustering result whose quality will not be affected by the size of the network. Moreover, it gives a term set describing each cluster, which makes it possible for people to have a clear comprehension for the clustering result, and facilitates the users to find resource in the network or manage the local documents in accordance with the whole network.\",\"PeriodicalId\":160788,\"journal\":{\"name\":\"2010 Third International Symposium on Knowledge Acquisition and Modeling\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-11-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 Third International Symposium on Knowledge Acquisition and Modeling\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/KAM.2010.5646177\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Third International Symposium on Knowledge Acquisition and Modeling","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/KAM.2010.5646177","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

文本聚类是自动构建大型文档集合的一项重要技术。它在点对点网络中更有价值。文档的高维性意味着如果每个节点都能得到近似的聚类结果，而不是将它们转移到一个中心进行聚类，可以节省更多的通信。现有的非结构化点对点网络文本聚类算法大多基于K-means算法。这些算法的一个问题是，随着网络规模的增加，聚类质量可能会下降。本文提出了一种基于频繁项集的对等网络文本聚类算法。它需要相对较少的通信量，同时获得的聚类结果的质量不受网络规模的影响。并且给出了描述每个聚类的术语集，使得人们对聚类结果有一个清晰的认识，方便用户在网络中查找资源或者按照整个网络管理局部文档。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Frequent term based peer-to-peer text clustering

Text clustering is an important technology for automatically structuring large document collections. It is much more valuable in peer-to-peer networks. The high dimensionality of documents means much more communication could be saved if each node could get the approximate clustering result by distributed algorithm instead of transferring them into a center and do the clustering. Most of the existing text clustering algorithms in unstructured peer-to-peer networks are based on K-means algorithm. A problem of those algorithms is that the clustering quality may decreased with the increase of the network size. In this paper, we propose a text clustering algorithm based on frequent term sets for peer-to-peer networks. It requires relatively lower communication volume while achieving a clustering result whose quality will not be affected by the size of the network. Moreover, it gives a term set describing each cluster, which makes it possible for people to have a clear comprehension for the clustering result, and facilitates the users to find resource in the network or manage the local documents in accordance with the whole network.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2010 Third International Symposium on Knowledge Acquisition and Modeling

自引率

0.00%

发文量