随机森林与梯度增强分类器算法在糖尿病分类中的准确率比较分析

Sahat Pandapotan Nainggolan, Ardiles Sinaga
{"title":"随机森林与梯度增强分类器算法在糖尿病分类中的准确率比较分析","authors":"Sahat Pandapotan Nainggolan, Ardiles Sinaga","doi":"10.46984/sebatik.v27i1.2157","DOIUrl":null,"url":null,"abstract":"Diabetes is a disease characterized by high blood sugar (glucose) levels. If blood sugar is not controlled properly, it can cause various critical diseases, one of which is diabetes. The purpose of this study was to determine the results of a comparison of the accuracy values ​​of the Random Forest Algorithm and the Gradient Boosting Classifier Algorithm in the classification of diabetes which will be tested for accuracy, Precision, Recall, and F1 score performance. The method used in this study was descriptive and the data source used the Pima Indians Diabetes Dataset from Kaggle. Based on data analysis using a ratio of 80:20, the Random Forest Algorithm has an accuracy of 79% obtained from the results of the confusion matrix. From the confusion matrix results, the results obtained were AUC 0.835, Recall 78%, and Precision 90%. Based on the results of Recall and Precision, an F1 score of 83% was obtained. Whereas the Boosting Classifier Algorithm has an accuracy result obtained from the results of the confusion matrix which is 81%. From the confusion matrix results, the AUC results were 0.877, Recall 83%, and Precision 67%. Based on the results of Recall and Precision, an F1 score of 74% was obtained. In this study, the accuracy evaluation results obtained were through the results of the Confusion matrix and the AUC value. These results indicate that the Gradient Boosting Classifier Algorithm has a more excellent accuracy evaluation result compared to the Random Forest Algorithm.","PeriodicalId":493984,"journal":{"name":"Sebatik","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"COMPARATIVE ANALYSIS OF ACCURACY OF RANDOM FOREST AND GRADIENT BOOSTING CLASSIFIER ALGORITHM FOR DIABETES CLASSIFICATION\",\"authors\":\"Sahat Pandapotan Nainggolan, Ardiles Sinaga\",\"doi\":\"10.46984/sebatik.v27i1.2157\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Diabetes is a disease characterized by high blood sugar (glucose) levels. If blood sugar is not controlled properly, it can cause various critical diseases, one of which is diabetes. The purpose of this study was to determine the results of a comparison of the accuracy values ​​of the Random Forest Algorithm and the Gradient Boosting Classifier Algorithm in the classification of diabetes which will be tested for accuracy, Precision, Recall, and F1 score performance. The method used in this study was descriptive and the data source used the Pima Indians Diabetes Dataset from Kaggle. Based on data analysis using a ratio of 80:20, the Random Forest Algorithm has an accuracy of 79% obtained from the results of the confusion matrix. From the confusion matrix results, the results obtained were AUC 0.835, Recall 78%, and Precision 90%. Based on the results of Recall and Precision, an F1 score of 83% was obtained. Whereas the Boosting Classifier Algorithm has an accuracy result obtained from the results of the confusion matrix which is 81%. From the confusion matrix results, the AUC results were 0.877, Recall 83%, and Precision 67%. Based on the results of Recall and Precision, an F1 score of 74% was obtained. In this study, the accuracy evaluation results obtained were through the results of the Confusion matrix and the AUC value. These results indicate that the Gradient Boosting Classifier Algorithm has a more excellent accuracy evaluation result compared to the Random Forest Algorithm.\",\"PeriodicalId\":493984,\"journal\":{\"name\":\"Sebatik\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Sebatik\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.46984/sebatik.v27i1.2157\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sebatik","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.46984/sebatik.v27i1.2157","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

糖尿病是一种以高血糖(葡萄糖)水平为特征的疾病。如果血糖控制不好,就会引起各种严重的疾病,糖尿病就是其中之一。本研究的目的是确定随机森林算法和梯度增强分类器算法在糖尿病分类中的准确率值的比较结果,并将对准确率、精密度、召回率和F1评分性能进行测试。本研究使用的方法是描述性的,数据源使用来自Kaggle的皮马印第安人糖尿病数据集。基于80:20比例的数据分析,随机森林算法从混淆矩阵的结果中获得了79%的准确率。从混淆矩阵结果中,得到的结果为AUC 0.835,召回率78%,精度90%。根据查全率(Recall)和查准率(Precision)的结果,得到了83%的F1分。而增强分类器算法从混淆矩阵的结果中获得的准确率为81%。从混淆矩阵结果来看,AUC结果为0.877,召回率为83%,精度为67%。根据查全率(Recall)和查准率(Precision)的结果,得到了74%的F1分。在本研究中,得到的精度评价结果是通过混淆矩阵和AUC值的结果。结果表明,与随机森林算法相比,梯度增强分类器算法具有更优异的准确率评价结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
COMPARATIVE ANALYSIS OF ACCURACY OF RANDOM FOREST AND GRADIENT BOOSTING CLASSIFIER ALGORITHM FOR DIABETES CLASSIFICATION
Diabetes is a disease characterized by high blood sugar (glucose) levels. If blood sugar is not controlled properly, it can cause various critical diseases, one of which is diabetes. The purpose of this study was to determine the results of a comparison of the accuracy values ​​of the Random Forest Algorithm and the Gradient Boosting Classifier Algorithm in the classification of diabetes which will be tested for accuracy, Precision, Recall, and F1 score performance. The method used in this study was descriptive and the data source used the Pima Indians Diabetes Dataset from Kaggle. Based on data analysis using a ratio of 80:20, the Random Forest Algorithm has an accuracy of 79% obtained from the results of the confusion matrix. From the confusion matrix results, the results obtained were AUC 0.835, Recall 78%, and Precision 90%. Based on the results of Recall and Precision, an F1 score of 83% was obtained. Whereas the Boosting Classifier Algorithm has an accuracy result obtained from the results of the confusion matrix which is 81%. From the confusion matrix results, the AUC results were 0.877, Recall 83%, and Precision 67%. Based on the results of Recall and Precision, an F1 score of 74% was obtained. In this study, the accuracy evaluation results obtained were through the results of the Confusion matrix and the AUC value. These results indicate that the Gradient Boosting Classifier Algorithm has a more excellent accuracy evaluation result compared to the Random Forest Algorithm.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信