Artificial bee colony algorithm for feature selection and improved support vector machine for text classification

IF 2.1 Q2 INFORMATION SCIENCE & LIBRARY SCIENCE
J. Balakumar, S. Mohan
{"title":"Artificial bee colony algorithm for feature selection and improved support vector machine for text classification","authors":"J. Balakumar, S. Mohan","doi":"10.1108/IDD-09-2018-0045","DOIUrl":null,"url":null,"abstract":"\nPurpose\nOwing to the huge volume of documents available on the internet, text classification becomes a necessary task to handle these documents. To achieve optimal text classification results, feature selection, an important stage, is used to curtail the dimensionality of text documents by choosing suitable features. The main purpose of this research work is to classify the personal computer documents based on their content.\n\n\nDesign/methodology/approach\nThis paper proposes a new algorithm for feature selection based on artificial bee colony (ABCFS) to enhance the text classification accuracy. The proposed algorithm (ABCFS) is scrutinized with the real and benchmark data sets, which is contrary to the other existing feature selection approaches such as information gain and χ2 statistic. To justify the efficiency of the proposed algorithm, the support vector machine (SVM) and improved SVM classifier are used in this paper.\n\n\nFindings\nThe experiment was conducted on real and benchmark data sets. The real data set was collected in the form of documents that were stored in the personal computer, and the benchmark data set was collected from Reuters and 20 Newsgroups corpus. The results prove the performance of the proposed feature selection algorithm by enhancing the text document classification accuracy.\n\n\nOriginality/value\nThis paper proposes a new ABCFS algorithm for feature selection, evaluates the efficiency of the ABCFS algorithm and improves the support vector machine. In this paper, the ABCFS algorithm is used to select the features from text (unstructured) documents. Although, there is no text feature selection algorithm in the existing work, the ABCFS algorithm is used to select the data (structured) features. The proposed algorithm will classify the documents automatically based on their content.\n","PeriodicalId":43488,"journal":{"name":"Information Discovery and Delivery","volume":" ","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2019-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1108/IDD-09-2018-0045","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Discovery and Delivery","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1108/IDD-09-2018-0045","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}
引用次数: 14

Abstract

Purpose Owing to the huge volume of documents available on the internet, text classification becomes a necessary task to handle these documents. To achieve optimal text classification results, feature selection, an important stage, is used to curtail the dimensionality of text documents by choosing suitable features. The main purpose of this research work is to classify the personal computer documents based on their content. Design/methodology/approach This paper proposes a new algorithm for feature selection based on artificial bee colony (ABCFS) to enhance the text classification accuracy. The proposed algorithm (ABCFS) is scrutinized with the real and benchmark data sets, which is contrary to the other existing feature selection approaches such as information gain and χ2 statistic. To justify the efficiency of the proposed algorithm, the support vector machine (SVM) and improved SVM classifier are used in this paper. Findings The experiment was conducted on real and benchmark data sets. The real data set was collected in the form of documents that were stored in the personal computer, and the benchmark data set was collected from Reuters and 20 Newsgroups corpus. The results prove the performance of the proposed feature selection algorithm by enhancing the text document classification accuracy. Originality/value This paper proposes a new ABCFS algorithm for feature selection, evaluates the efficiency of the ABCFS algorithm and improves the support vector machine. In this paper, the ABCFS algorithm is used to select the features from text (unstructured) documents. Although, there is no text feature selection algorithm in the existing work, the ABCFS algorithm is used to select the data (structured) features. The proposed algorithm will classify the documents automatically based on their content.
人工蜂群算法用于特征选择和改进的支持向量机用于文本分类
目的由于互联网上的海量文档,文本分类成为处理这些文档的必要任务。为了获得最佳的文本分类结果,特征选择是文本分类的一个重要阶段,通过选择合适的特征来降低文本文档的维数。本研究的主要目的是对个人电脑文档进行内容分类。为了提高文本分类的准确率,本文提出了一种基于人工蜂群(ABCFS)的特征选择算法。该算法与现有的信息增益和χ2统计等特征选择方法不同,采用真实数据集和基准数据集对算法进行了检验。为了验证该算法的有效性,本文采用了支持向量机(SVM)和改进的SVM分类器。实验结果在真实数据集和基准数据集上进行。真实数据集以文档的形式收集,存储在个人计算机中,基准数据集收集自Reuters和20 Newsgroups语料库。实验结果证明了所提特征选择算法的有效性,提高了文本文档的分类准确率。提出了一种新的ABCFS特征选择算法,对ABCFS算法的效率进行了评价,并对支持向量机进行了改进。本文采用ABCFS算法从文本(非结构化)文档中选择特征。虽然现有工作中没有文本特征选择算法,但使用ABCFS算法来选择数据(结构化)特征。该算法将根据内容自动对文档进行分类。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Information Discovery and Delivery
Information Discovery and Delivery INFORMATION SCIENCE & LIBRARY SCIENCE-
CiteScore
5.40
自引率
4.80%
发文量
21
期刊介绍: Information Discovery and Delivery covers information discovery and access for digital information researchers. This includes educators, knowledge professionals in education and cultural organisations, knowledge managers in media, health care and government, as well as librarians. The journal publishes research and practice which explores the digital information supply chain ie transport, flows, tracking, exchange and sharing, including within and between libraries. It is also interested in digital information capture, packaging and storage by ‘collectors’ of all kinds. Information is widely defined, including but not limited to: Records, Documents, Learning objects, Visual and sound files, Data and metadata and , User-generated content.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信