A Novel Data Mining Approach for Multi Variant Text Classification

K. Dsouza, Zaheed Ahmed Ansari
{"title":"A Novel Data Mining Approach for Multi Variant Text Classification","authors":"K. Dsouza, Zaheed Ahmed Ansari","doi":"10.1109/CCEM.2015.11","DOIUrl":null,"url":null,"abstract":"Text classification, which aims to assign a document to one or more categories based on its content, is a fundamental task for Web and/or document data mining applications. In natural language processing and information extraction fields Text classification is emerging as an important part, were we can use this approach to discover useful information from large database. These approaches allow individuals to construct classifiers that have relevance for a variety of domains. Existing algorithms such as Svm Light have less GUI support and take more time to perform classification task. In this presented work classification of multi-domain documents is performed by using weka-LibSVM classifier. Here to transform collected training set and test set documents into term-document matrix (TDM), the vector space model is used. In classifier TDM is used to generate predicted results. The results emerged from weka with its GUI support using TDM have quick response time in classifying the documents.","PeriodicalId":339923,"journal":{"name":"2015 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCEM.2015.11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Text classification, which aims to assign a document to one or more categories based on its content, is a fundamental task for Web and/or document data mining applications. In natural language processing and information extraction fields Text classification is emerging as an important part, were we can use this approach to discover useful information from large database. These approaches allow individuals to construct classifiers that have relevance for a variety of domains. Existing algorithms such as Svm Light have less GUI support and take more time to perform classification task. In this presented work classification of multi-domain documents is performed by using weka-LibSVM classifier. Here to transform collected training set and test set documents into term-document matrix (TDM), the vector space model is used. In classifier TDM is used to generate predicted results. The results emerged from weka with its GUI support using TDM have quick response time in classifying the documents.
一种新的多变体文本分类数据挖掘方法
文本分类是Web和/或文档数据挖掘应用程序的一项基本任务,其目的是根据文档的内容将文档分配到一个或多个类别。在自然语言处理和信息抽取领域,文本分类作为一个重要的组成部分正在兴起,我们可以利用这种方法从大型数据库中发现有用的信息。这些方法允许个人构建与各种领域相关的分类器。现有的Svm Light算法对GUI的支持较少,执行分类任务需要花费更多的时间。本文采用weka-LibSVM分类器对多领域文档进行分类。这里使用向量空间模型将收集到的训练集和测试集文档转换为术语-文档矩阵(TDM)。在分类器中,TDM用于生成预测结果。使用TDM支持GUI的weka的结果在分类文档方面具有快速的响应时间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信