A Novel Data Mining Approach for Multi Variant Text Classification

2015 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM) Pub Date : 2015-11-01 DOI:10.1109/CCEM.2015.11

K. Dsouza, Zaheed Ahmed Ansari

引用次数: 5

Abstract

Text classification, which aims to assign a document to one or more categories based on its content, is a fundamental task for Web and/or document data mining applications. In natural language processing and information extraction fields Text classification is emerging as an important part, were we can use this approach to discover useful information from large database. These approaches allow individuals to construct classifiers that have relevance for a variety of domains. Existing algorithms such as Svm Light have less GUI support and take more time to perform classification task. In this presented work classification of multi-domain documents is performed by using weka-LibSVM classifier. Here to transform collected training set and test set documents into term-document matrix (TDM), the vector space model is used. In classifier TDM is used to generate predicted results. The results emerged from weka with its GUI support using TDM have quick response time in classifying the documents.

查看原文本刊更多论文

一种新的多变体文本分类数据挖掘方法

文本分类是Web和/或文档数据挖掘应用程序的一项基本任务，其目的是根据文档的内容将文档分配到一个或多个类别。在自然语言处理和信息抽取领域，文本分类作为一个重要的组成部分正在兴起，我们可以利用这种方法从大型数据库中发现有用的信息。这些方法允许个人构建与各种领域相关的分类器。现有的Svm Light算法对GUI的支持较少，执行分类任务需要花费更多的时间。本文采用weka-LibSVM分类器对多领域文档进行分类。这里使用向量空间模型将收集到的训练集和测试集文档转换为术语-文档矩阵(TDM)。在分类器中，TDM用于生成预测结果。使用TDM支持GUI的weka的结果在分类文档方面具有快速的响应时间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM)

自引率

0.00%

发文量