Incremental text categorization based on hybrid optimization-based deep belief neural network

IF 0.7 Q4 COMPUTER SCIENCE, INFORMATION SYSTEMS
V. Srilakshmi, K. Anuradha, C. Bindu
{"title":"Incremental text categorization based on hybrid optimization-based deep belief neural network","authors":"V. Srilakshmi, K. Anuradha, C. Bindu","doi":"10.3233/JHS-210659","DOIUrl":null,"url":null,"abstract":"One of the effective text categorization methods for learning the large-scale data and the accumulated data is incremental learning. The major challenge in the incremental learning is improving the accuracy as the text document consists of numerous terms. In this research, a incremental text categorization method is developed using the proposed Spider Grasshopper Crow Optimization Algorithm based Deep Belief Neural network (SGrC-based DBN) for providing optimal text categorization results. The proposed text categorization method has four processes, such as are pre-processing, feature extraction, feature selection, text categorization, and incremental learning. Initially, the database is pre-processed and fed into vector space model for the extraction of features. Once the features are extracted, the feature selection is carried out based on mutual information. Then, the text categorization is performed using the proposed SGrC-based DBN method, which is developed by the integration of the spider monkey optimization (SMO) with the Grasshopper Crow Optimization Algorithm (GCOA) algorithm. Finally, the incremental text categorization is performed based on the hybrid weight bounding model that includes the SGrC and Range degree and particularly, the optimal weights of the Range degree model is selected based on SGrC. The experimental result of the proposed text categorization method is performed by considering the data from the Reuter database and 20 Newsgroups database. The comparative analysis of the text categorization method is based on the performance metrics, such as precision, recall and accuracy. The proposed SGrC algorithm obtained a maximum accuracy of 0.9626, maximum precision of 0.9681 and maximum recall of 0.9600, respectively when compared with the existing incremental text categorization methods.","PeriodicalId":54809,"journal":{"name":"Journal of High Speed Networks","volume":"31 1","pages":"183-202"},"PeriodicalIF":0.7000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of High Speed Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/JHS-210659","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 1

Abstract

One of the effective text categorization methods for learning the large-scale data and the accumulated data is incremental learning. The major challenge in the incremental learning is improving the accuracy as the text document consists of numerous terms. In this research, a incremental text categorization method is developed using the proposed Spider Grasshopper Crow Optimization Algorithm based Deep Belief Neural network (SGrC-based DBN) for providing optimal text categorization results. The proposed text categorization method has four processes, such as are pre-processing, feature extraction, feature selection, text categorization, and incremental learning. Initially, the database is pre-processed and fed into vector space model for the extraction of features. Once the features are extracted, the feature selection is carried out based on mutual information. Then, the text categorization is performed using the proposed SGrC-based DBN method, which is developed by the integration of the spider monkey optimization (SMO) with the Grasshopper Crow Optimization Algorithm (GCOA) algorithm. Finally, the incremental text categorization is performed based on the hybrid weight bounding model that includes the SGrC and Range degree and particularly, the optimal weights of the Range degree model is selected based on SGrC. The experimental result of the proposed text categorization method is performed by considering the data from the Reuter database and 20 Newsgroups database. The comparative analysis of the text categorization method is based on the performance metrics, such as precision, recall and accuracy. The proposed SGrC algorithm obtained a maximum accuracy of 0.9626, maximum precision of 0.9681 and maximum recall of 0.9600, respectively when compared with the existing incremental text categorization methods.
基于混合优化的深度信念神经网络增量文本分类
增量学习是学习大规模数据和积累数据的有效文本分类方法之一。增量学习的主要挑战是提高文本文档由大量术语组成的准确性。本研究提出了一种基于蜘蛛蚱蜢乌鸦优化算法的基于深度信念神经网络(SGrC-based DBN)的增量文本分类方法,以提供最优的文本分类结果。本文提出的文本分类方法包括预处理、特征提取、特征选择、文本分类和增量学习四个过程。首先对数据库进行预处理,并将其输入到向量空间模型中进行特征提取。特征提取完成后,基于互信息进行特征选择。然后,使用基于sgrc的DBN方法进行文本分类,该方法是将蜘蛛猴优化算法(SMO)与蚱蜢乌鸦优化算法(GCOA)相结合而开发的。最后,基于包含SGrC和Range度的混合权值边界模型对文本进行增量分类,并基于SGrC选择Range度模型的最优权值。利用路透社数据库和20个新闻组数据库的数据对本文的文本分类方法进行了实验。文本分类方法的比较分析是基于准确率、召回率和准确率等性能指标进行的。与现有的增量文本分类方法相比,本文算法的最大正确率为0.9626,最大精密度为0.9681,最大查全率为0.9600。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of High Speed Networks
Journal of High Speed Networks Computer Science-Computer Networks and Communications
CiteScore
1.80
自引率
11.10%
发文量
26
期刊介绍: The Journal of High Speed Networks is an international archival journal, active since 1992, providing a publication vehicle for covering a large number of topics of interest in the high performance networking and communication area. Its audience includes researchers, managers as well as network designers and operators. The main goal will be to provide timely dissemination of information and scientific knowledge. The journal will publish contributed papers on novel research, survey and position papers on topics of current interest, technical notes, and short communications to report progress on long-term projects. Submissions to the Journal will be refereed consistently with the review process of leading technical journals, based on originality, significance, quality, and clarity. The journal will publish papers on a number of topics ranging from design to practical experiences with operational high performance/speed networks.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信