Text Document Clustering with Hybrid Feature Selection

International Conference on Information Integration and Web-based Applications & Services Pub Date : 2013-12-02 DOI:10.1145/2539150.2539225

Asmaa Benghabrit, B. Frikh, B. Ouhbi, E. Zemmouri, Hicham Behja

引用次数: 6

Abstract

Finding the appropriate information and understanding to human research is a delicate task when dealing with an outstanding number of unstructured texts created daily. Hence the objective of clustering algorithms which are part of the powerful text mining tools. In this paper, we propose a novel text document clustering based on a new hybrid feature selection method that we call HFSM. This technique extracts statistical and semantic relevant terms to pilot the clustering mechanism. The experiments conducted on Reuters corpus demonstrate the practical aspects of our algorithm and show that it generates more accurate clustering than the one obtained by other existing algorithms.

查看原文本刊更多论文

混合特征选择的文本文档聚类

在处理每天产生的大量非结构化文本时，为人类研究找到适当的信息和理解是一项微妙的任务。因此，聚类算法的目标是强大的文本挖掘工具的一部分。在本文中，我们提出了一种新的基于混合特征选择的文本文档聚类方法，我们称之为HFSM。该技术提取统计和语义相关的术语来引导聚类机制。在路透社语料库上进行的实验证明了该算法的实用性，并表明该算法比其他现有算法产生的聚类更准确。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Conference on Information Integration and Web-based Applications & Services

自引率

0.00%

发文量