基于WEKA的中文文档聚类研究

2011 International Conference on Machine Learning and Cybernetics Pub Date : 2011-07-10 DOI:10.1109/ICMLC.2011.6016955

P. Han, Dongbo Wang, Qingwei Zhao

{"title":"基于WEKA的中文文档聚类研究","authors":"P. Han, Dongbo Wang, Qingwei Zhao","doi":"10.1109/ICMLC.2011.6016955","DOIUrl":null,"url":null,"abstract":"This paper gives an experiment on Chinese document clustering based on WEKA. WEKA is an excellent open-source of data mining tool in abroad, but it is rarely used at home. We conducted the Chinese document clustering by K-means algorithm through adjusting the parameters in WEKA. Recall, precision and F-measure method are used to evaluate the experiment. We hope to provide a reference for researchers in this field.","PeriodicalId":228516,"journal":{"name":"2011 International Conference on Machine Learning and Cybernetics","volume":"146 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"The research on Chinese document clustering based on WEKA\",\"authors\":\"P. Han, Dongbo Wang, Qingwei Zhao\",\"doi\":\"10.1109/ICMLC.2011.6016955\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper gives an experiment on Chinese document clustering based on WEKA. WEKA is an excellent open-source of data mining tool in abroad, but it is rarely used at home. We conducted the Chinese document clustering by K-means algorithm through adjusting the parameters in WEKA. Recall, precision and F-measure method are used to evaluate the experiment. We hope to provide a reference for researchers in this field.\",\"PeriodicalId\":228516,\"journal\":{\"name\":\"2011 International Conference on Machine Learning and Cybernetics\",\"volume\":\"146 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-07-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 International Conference on Machine Learning and Cybernetics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLC.2011.6016955\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 International Conference on Machine Learning and Cybernetics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLC.2011.6016955","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

摘要

本文给出了一个基于WEKA的中文文档聚类实验。WEKA在国外是一个优秀的开源数据挖掘工具，但在国内却很少使用。我们通过调整WEKA中的参数，用K-means算法对中文文档进行聚类。采用召回率、精密度和f测量法对实验进行评价。希望为该领域的研究人员提供参考。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

The research on Chinese document clustering based on WEKA

This paper gives an experiment on Chinese document clustering based on WEKA. WEKA is an excellent open-source of data mining tool in abroad, but it is rarely used at home. We conducted the Chinese document clustering by K-means algorithm through adjusting the parameters in WEKA. Recall, precision and F-measure method are used to evaluate the experiment. We hope to provide a reference for researchers in this field.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 International Conference on Machine Learning and Cybernetics

自引率

0.00%

发文量