Frequent term based peer-to-peer text clustering

Qing He, Tingting Li, Fuzhen Zhuang, Zhongzhi Shi
{"title":"Frequent term based peer-to-peer text clustering","authors":"Qing He, Tingting Li, Fuzhen Zhuang, Zhongzhi Shi","doi":"10.1109/KAM.2010.5646177","DOIUrl":null,"url":null,"abstract":"Text clustering is an important technology for automatically structuring large document collections. It is much more valuable in peer-to-peer networks. The high dimensionality of documents means much more communication could be saved if each node could get the approximate clustering result by distributed algorithm instead of transferring them into a center and do the clustering. Most of the existing text clustering algorithms in unstructured peer-to-peer networks are based on K-means algorithm. A problem of those algorithms is that the clustering quality may decreased with the increase of the network size. In this paper, we propose a text clustering algorithm based on frequent term sets for peer-to-peer networks. It requires relatively lower communication volume while achieving a clustering result whose quality will not be affected by the size of the network. Moreover, it gives a term set describing each cluster, which makes it possible for people to have a clear comprehension for the clustering result, and facilitates the users to find resource in the network or manage the local documents in accordance with the whole network.","PeriodicalId":160788,"journal":{"name":"2010 Third International Symposium on Knowledge Acquisition and Modeling","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Third International Symposium on Knowledge Acquisition and Modeling","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/KAM.2010.5646177","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

Text clustering is an important technology for automatically structuring large document collections. It is much more valuable in peer-to-peer networks. The high dimensionality of documents means much more communication could be saved if each node could get the approximate clustering result by distributed algorithm instead of transferring them into a center and do the clustering. Most of the existing text clustering algorithms in unstructured peer-to-peer networks are based on K-means algorithm. A problem of those algorithms is that the clustering quality may decreased with the increase of the network size. In this paper, we propose a text clustering algorithm based on frequent term sets for peer-to-peer networks. It requires relatively lower communication volume while achieving a clustering result whose quality will not be affected by the size of the network. Moreover, it gives a term set describing each cluster, which makes it possible for people to have a clear comprehension for the clustering result, and facilitates the users to find resource in the network or manage the local documents in accordance with the whole network.
频繁基于术语的点对点文本聚类
文本聚类是自动构建大型文档集合的一项重要技术。它在点对点网络中更有价值。文档的高维性意味着如果每个节点都能得到近似的聚类结果,而不是将它们转移到一个中心进行聚类,可以节省更多的通信。现有的非结构化点对点网络文本聚类算法大多基于K-means算法。这些算法的一个问题是,随着网络规模的增加,聚类质量可能会下降。本文提出了一种基于频繁项集的对等网络文本聚类算法。它需要相对较少的通信量,同时获得的聚类结果的质量不受网络规模的影响。并且给出了描述每个聚类的术语集,使得人们对聚类结果有一个清晰的认识,方便用户在网络中查找资源或者按照整个网络管理局部文档。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信