Textual Document Clustering in Traditional and Modern Approaches (Review)

2018 IEEE Conference on Systems, Process and Control (ICSPC) Pub Date : 2018-12-01 DOI:10.1109/SPC.2018.8704130

W. Yafooz, Z. Bakar, A. Mithun

引用次数: 1

Abstract

An enormous quantity of textual documents is created from the advanced technological use concerning describing, intelligence, interconnection, and thousands of distinct authorizations and was expanding each moment of quotidian circumstances. The Clustering is an automated established process to organize the database on features. Outwardly implementing a clustering technique to textual data, a huge quantity of unstructured data is losing the capability of sharing knowledge. There are many tools and techniques proposed. This paper present and categorized the textual document clustering algorithms (approaches) into two types are classical and modern approaches. Both approaches are implemented to those textual data to obtain and consolidate knowledge from discharged to an extraordinary impression of a prepared document. The two important factors in clustering process are speed of clustering process and accuracy or purity of data clusters. This review paper can be benefits to many researchers who concern on textual document clustering, text mining and data scientist.

查看原文本刊更多论文

传统与现代文本文档聚类方法(综述)

大量的文本文件是从先进的技术使用中创建的，涉及描述，智能，互连和数千种不同的授权，并且正在扩展日常环境的每一刻。集群是一个自动建立的过程，用于组织数据库的特征。在对文本数据实施聚类技术的过程中，大量的非结构化数据失去了知识共享的能力。人们提出了许多工具和技术。本文将文本文档聚类算法分为经典聚类算法和现代聚类算法两类。这两种方法对这些文本数据进行了实现，以获取和巩固知识，从释放到一个准备好的文件的非凡印象。聚类过程的两个重要因素是聚类过程的速度和数据聚类的准确性或纯度。本文对文本文档聚类、文本挖掘和数据科学家的研究有一定的参考价值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 IEEE Conference on Systems, Process and Control (ICSPC)

自引率

0.00%

发文量