The research on Chinese document clustering based on WEKA

2011 International Conference on Machine Learning and Cybernetics Pub Date : 2011-07-10 DOI:10.1109/ICMLC.2011.6016955

P. Han, Dongbo Wang, Qingwei Zhao

引用次数: 12

Abstract

This paper gives an experiment on Chinese document clustering based on WEKA. WEKA is an excellent open-source of data mining tool in abroad, but it is rarely used at home. We conducted the Chinese document clustering by K-means algorithm through adjusting the parameters in WEKA. Recall, precision and F-measure method are used to evaluate the experiment. We hope to provide a reference for researchers in this field.

查看原文本刊更多论文

基于WEKA的中文文档聚类研究

本文给出了一个基于WEKA的中文文档聚类实验。WEKA在国外是一个优秀的开源数据挖掘工具，但在国内却很少使用。我们通过调整WEKA中的参数，用K-means算法对中文文档进行聚类。采用召回率、精密度和f测量法对实验进行评价。希望为该领域的研究人员提供参考。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 International Conference on Machine Learning and Cybernetics

自引率

0.00%

发文量