基于多表示索引树文本聚类的新型聚类检测

2010 2nd International Workshop on Database Technology and Applications Pub Date : 2010-12-06 DOI:10.1109/DBTA.2010.5659018

Hui Song, Lifeng Wang, Baiyan Li, Xiaoqiang Liu

{"title":"基于多表示索引树文本聚类的新型聚类检测","authors":"Hui Song, Lifeng Wang, Baiyan Li, Xiaoqiang Liu","doi":"10.1109/DBTA.2010.5659018","DOIUrl":null,"url":null,"abstract":"Traditional Clustering is a powerful technique for revealing the \"hot\" topics among documents. However, it's hard to discover the new type events coming out gradually. In this paper, we propose a novel model for detecting new clusters from time-streaming documents. It consists of three parts: the cluster definition based on Multi-Representation Index Tree (MI-Tree), the new cluster detecting process and the metrics for measuring a new cluster. Compared with the traditional method, we process the newly coming data first and merge the old clustering tree into the new one. This algorithm can avoid this effect: the documents enjoying high similarity were assigned to different clusters. We designed and implemented a system for practical application, the experimental results on a variety of domains demonstrate that our algorithm can recognize new valuable clusters during the iteration process, and produce quality clusters.","PeriodicalId":320509,"journal":{"name":"2010 2nd International Workshop on Database Technology and Applications","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"New Cluster Detection Based on Multi-Representation Index Tree Text Clustering\",\"authors\":\"Hui Song, Lifeng Wang, Baiyan Li, Xiaoqiang Liu\",\"doi\":\"10.1109/DBTA.2010.5659018\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Traditional Clustering is a powerful technique for revealing the \\\"hot\\\" topics among documents. However, it's hard to discover the new type events coming out gradually. In this paper, we propose a novel model for detecting new clusters from time-streaming documents. It consists of three parts: the cluster definition based on Multi-Representation Index Tree (MI-Tree), the new cluster detecting process and the metrics for measuring a new cluster. Compared with the traditional method, we process the newly coming data first and merge the old clustering tree into the new one. This algorithm can avoid this effect: the documents enjoying high similarity were assigned to different clusters. We designed and implemented a system for practical application, the experimental results on a variety of domains demonstrate that our algorithm can recognize new valuable clusters during the iteration process, and produce quality clusters.\",\"PeriodicalId\":320509,\"journal\":{\"name\":\"2010 2nd International Workshop on Database Technology and Applications\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-12-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 2nd International Workshop on Database Technology and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DBTA.2010.5659018\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 2nd International Workshop on Database Technology and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DBTA.2010.5659018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

传统聚类是一种强大的技术，用于揭示文档中的“热门”主题。然而，我们很难发现逐渐涌现出来的新型事件。在本文中，我们提出了一种从时间流文档中检测新聚类的新模型。它由三部分组成:基于多表示索引树(MI-Tree)的聚类定义、新的聚类检测过程和度量新聚类的度量标准。与传统方法相比，我们先处理新数据，然后将旧的聚类树合并到新的聚类树中。该算法可以避免这种影响:将相似度高的文档分配到不同的聚类中。我们设计并实现了一个实际应用的系统，在多个领域的实验结果表明，我们的算法可以在迭代过程中识别出新的有价值的聚类，并产生高质量的聚类。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

New Cluster Detection Based on Multi-Representation Index Tree Text Clustering

Traditional Clustering is a powerful technique for revealing the "hot" topics among documents. However, it's hard to discover the new type events coming out gradually. In this paper, we propose a novel model for detecting new clusters from time-streaming documents. It consists of three parts: the cluster definition based on Multi-Representation Index Tree (MI-Tree), the new cluster detecting process and the metrics for measuring a new cluster. Compared with the traditional method, we process the newly coming data first and merge the old clustering tree into the new one. This algorithm can avoid this effect: the documents enjoying high similarity were assigned to different clusters. We designed and implemented a system for practical application, the experimental results on a variety of domains demonstrate that our algorithm can recognize new valuable clusters during the iteration process, and produce quality clusters.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2010 2nd International Workshop on Database Technology and Applications

自引率

0.00%

发文量