Search Results Clustering Algorithm Based on the Suffix Tree

2015 2nd International Conference on Information Science and Control Engineering Pub Date : 2015-04-24 DOI:10.1109/ICISCE.2015.106

Dengwei Wang, Libo Liu, Jing Dong, Jiao Zheng

引用次数: 1

Abstract

The STC algorithm clusters the documents based on shared phrases and it is a linear time algorithm. Directed against the insufficiency of the existing STC algorithm such as the quality of clustering results and the screening of the clustering labels, the paper improves STC algorithm, respectively perfecting the choice of the base cluster, the similarity calculation formula used to merge the base clusters and the scoring function for the clustering labels. Finally entropy is taken as the evaluation criterion for the clustering results. Compared with the original algorithm there are a better effect which is attested by experiments and more readability, descriptive and distinguishable clustering labels.

查看原文本刊更多论文

基于后缀树的搜索结果聚类算法

STC算法基于共享短语对文档进行聚类，是一种线性时间算法。针对现有STC算法在聚类结果质量和聚类标签筛选等方面的不足，本文对STC算法进行了改进，分别完善了基簇的选择、基簇合并的相似度计算公式和聚类标签的评分函数。最后以熵作为聚类结果的评价标准。实验结果表明，与原算法相比，该算法的聚类效果更好，聚类标签更具可读性、描述性和可识别性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 2nd International Conference on Information Science and Control Engineering

自引率

0.00%

发文量