Boosting and bagging classification for computer science journal

Nastiti Susetyo Fanany Putri, A. Wibawa, Harits Ar Rasyid, A. Nafalski, Ummi Rabaah Hasyim
{"title":"Boosting and bagging classification for computer science journal","authors":"Nastiti Susetyo Fanany Putri, A. Wibawa, Harits Ar Rasyid, A. Nafalski, Ummi Rabaah Hasyim","doi":"10.26555/ijain.v9i1.985","DOIUrl":null,"url":null,"abstract":"In recent years, data processing has become an issue across all disciplines. Good data processing can provide decision-making recommendations. Data processing is covered in academic data processing publications, including those in computer science. This topic has grown over the past three years, demonstrating that data processing is expanding and diversifying, and there is a great deal of interest in this area of study. Within the journal, groupings (quartiles) indicate the journal's influence on other similar studies. SCImago provides this category. There are four quartiles, with the highest quartile being 1 and the lowest being 4. There are, however, numerous differences in class quartiles, with different quartile values for the same journal in different disciplines. Therefore, a method of categorization is provided to solve this issue. Classification is a machine-learning technique that groups data based on the supplied label class. Ensemble Boosting and Bagging with Decision Tree (DT) and Gaussian Nave Bayes (GNB) were utilized in this study. Several modifications were made to the ensemble algorithm's depth and estimator settings to examine the influence of adding values on the resultant precision. In the DT algorithm, both variables are altered, whereas, in the GNB algorithm, just the estimator's value is modified. Based on the average value of the accuracy results, it is known that the best algorithm for computer science datasets is GNB Bagging, with values of 68.96%, 70.99%, and 69.05%. Second-place XGBDT has 67.75% accuracy, 67.69% precision, and 67.83 recall. The DT Bagging method placed third with 67.31 percent recall, 68.13 percent precision, and 67.30 percent accuracy. The fourth sequence is the XGBoost GNB approach, which has an accuracy of 67.07%, a precision of 68.85%, and a recall of 67.18%. The Adaboost DT technique ranks in the fifth position with an accuracy of 63.65%, a precision of 64.21 %, and a recall of 63.63 %. Adaboost GNB is the least efficient algorithm for this dataset since it only achieves 43.19 % accuracy, 48.14 % precision, and 43.2% recall. The results are still quite far from the ideal. Hence the proposed method for journal quartile inequality issues is not advised.","PeriodicalId":52195,"journal":{"name":"International Journal of Advances in Intelligent Informatics","volume":"26 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Advances in Intelligent Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.26555/ijain.v9i1.985","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

In recent years, data processing has become an issue across all disciplines. Good data processing can provide decision-making recommendations. Data processing is covered in academic data processing publications, including those in computer science. This topic has grown over the past three years, demonstrating that data processing is expanding and diversifying, and there is a great deal of interest in this area of study. Within the journal, groupings (quartiles) indicate the journal's influence on other similar studies. SCImago provides this category. There are four quartiles, with the highest quartile being 1 and the lowest being 4. There are, however, numerous differences in class quartiles, with different quartile values for the same journal in different disciplines. Therefore, a method of categorization is provided to solve this issue. Classification is a machine-learning technique that groups data based on the supplied label class. Ensemble Boosting and Bagging with Decision Tree (DT) and Gaussian Nave Bayes (GNB) were utilized in this study. Several modifications were made to the ensemble algorithm's depth and estimator settings to examine the influence of adding values on the resultant precision. In the DT algorithm, both variables are altered, whereas, in the GNB algorithm, just the estimator's value is modified. Based on the average value of the accuracy results, it is known that the best algorithm for computer science datasets is GNB Bagging, with values of 68.96%, 70.99%, and 69.05%. Second-place XGBDT has 67.75% accuracy, 67.69% precision, and 67.83 recall. The DT Bagging method placed third with 67.31 percent recall, 68.13 percent precision, and 67.30 percent accuracy. The fourth sequence is the XGBoost GNB approach, which has an accuracy of 67.07%, a precision of 68.85%, and a recall of 67.18%. The Adaboost DT technique ranks in the fifth position with an accuracy of 63.65%, a precision of 64.21 %, and a recall of 63.63 %. Adaboost GNB is the least efficient algorithm for this dataset since it only achieves 43.19 % accuracy, 48.14 % precision, and 43.2% recall. The results are still quite far from the ideal. Hence the proposed method for journal quartile inequality issues is not advised.
计算机科学期刊的Boosting和bagging分类
近年来,数据处理已成为一个跨学科的问题。良好的数据处理可以提供决策建议。数据处理在包括计算机科学在内的学术数据处理出版物中都有涉及。这一主题在过去三年中不断发展,表明数据处理正在扩展和多样化,并且对这一研究领域有很大的兴趣。在期刊中,分组(四分位数)表明该期刊对其他类似研究的影响。SCImago提供了这个类别。有四个四分位数,最高的四分位数是1,最低的四分位数是4。然而,在类别四分位数中存在许多差异,同一期刊在不同学科中具有不同的四分位数值。因此,提供了一种分类方法来解决这一问题。分类是一种机器学习技术,它根据提供的标签类对数据进行分组。本研究采用决策树(DT)和高斯中贝叶斯(GNB)的集合增强和Bagging方法。对集成算法的深度和估计器设置进行了一些修改,以检验添加值对结果精度的影响。在DT算法中,两个变量都被改变,而在GNB算法中,只修改估计量的值。从准确率结果的平均值来看,对于计算机科学数据集,GNB Bagging算法的准确率最高,分别为68.96%、70.99%和69.05%。第二名XGBDT的准确率为67.75%,精密度为67.69%,召回率为67.83。DT Bagging方法排名第三,召回率为67.31%,准确率为68.13%,准确率为67.30%。第四个序列是XGBoost GNB方法,准确率为67.07%,精密度为68.85%,召回率为67.18%。Adaboost DT技术排名第五,准确率为63.65%,精密度为64.21%,召回率为63.63%。Adaboost GNB是该数据集效率最低的算法,因为它仅达到43.19%的准确率,48.14%的精度和43.2%的召回率。结果离理想还差得很远。因此,建议的方法杂志的四分位数不平等问题是不建议的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
International Journal of Advances in Intelligent Informatics
International Journal of Advances in Intelligent Informatics Computer Science-Computer Vision and Pattern Recognition
CiteScore
3.00
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信