COMPARATIVE ANALYSIS OF ACCURACY OF RANDOM FOREST AND GRADIENT BOOSTING CLASSIFIER ALGORITHM FOR DIABETES CLASSIFICATION

Sebatik Pub Date : 2023-06-06 DOI:10.46984/sebatik.v27i1.2157

Sahat Pandapotan Nainggolan, Ardiles Sinaga

{"title":"COMPARATIVE ANALYSIS OF ACCURACY OF RANDOM FOREST AND GRADIENT BOOSTING CLASSIFIER ALGORITHM FOR DIABETES CLASSIFICATION","authors":"Sahat Pandapotan Nainggolan, Ardiles Sinaga","doi":"10.46984/sebatik.v27i1.2157","DOIUrl":null,"url":null,"abstract":"Diabetes is a disease characterized by high blood sugar (glucose) levels. If blood sugar is not controlled properly, it can cause various critical diseases, one of which is diabetes. The purpose of this study was to determine the results of a comparison of the accuracy values of the Random Forest Algorithm and the Gradient Boosting Classifier Algorithm in the classification of diabetes which will be tested for accuracy, Precision, Recall, and F1 score performance. The method used in this study was descriptive and the data source used the Pima Indians Diabetes Dataset from Kaggle. Based on data analysis using a ratio of 80:20, the Random Forest Algorithm has an accuracy of 79% obtained from the results of the confusion matrix. From the confusion matrix results, the results obtained were AUC 0.835, Recall 78%, and Precision 90%. Based on the results of Recall and Precision, an F1 score of 83% was obtained. Whereas the Boosting Classifier Algorithm has an accuracy result obtained from the results of the confusion matrix which is 81%. From the confusion matrix results, the AUC results were 0.877, Recall 83%, and Precision 67%. Based on the results of Recall and Precision, an F1 score of 74% was obtained. In this study, the accuracy evaluation results obtained were through the results of the Confusion matrix and the AUC value. These results indicate that the Gradient Boosting Classifier Algorithm has a more excellent accuracy evaluation result compared to the Random Forest Algorithm.","PeriodicalId":493984,"journal":{"name":"Sebatik","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sebatik","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.46984/sebatik.v27i1.2157","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Diabetes is a disease characterized by high blood sugar (glucose) levels. If blood sugar is not controlled properly, it can cause various critical diseases, one of which is diabetes. The purpose of this study was to determine the results of a comparison of the accuracy values of the Random Forest Algorithm and the Gradient Boosting Classifier Algorithm in the classification of diabetes which will be tested for accuracy, Precision, Recall, and F1 score performance. The method used in this study was descriptive and the data source used the Pima Indians Diabetes Dataset from Kaggle. Based on data analysis using a ratio of 80:20, the Random Forest Algorithm has an accuracy of 79% obtained from the results of the confusion matrix. From the confusion matrix results, the results obtained were AUC 0.835, Recall 78%, and Precision 90%. Based on the results of Recall and Precision, an F1 score of 83% was obtained. Whereas the Boosting Classifier Algorithm has an accuracy result obtained from the results of the confusion matrix which is 81%. From the confusion matrix results, the AUC results were 0.877, Recall 83%, and Precision 67%. Based on the results of Recall and Precision, an F1 score of 74% was obtained. In this study, the accuracy evaluation results obtained were through the results of the Confusion matrix and the AUC value. These results indicate that the Gradient Boosting Classifier Algorithm has a more excellent accuracy evaluation result compared to the Random Forest Algorithm.

查看原文本刊更多论文

随机森林与梯度增强分类器算法在糖尿病分类中的准确率比较分析

糖尿病是一种以高血糖(葡萄糖)水平为特征的疾病。如果血糖控制不好，就会引起各种严重的疾病，糖尿病就是其中之一。本研究的目的是确定随机森林算法和梯度增强分类器算法在糖尿病分类中的准确率值的比较结果，并将对准确率、精密度、召回率和F1评分性能进行测试。本研究使用的方法是描述性的，数据源使用来自Kaggle的皮马印第安人糖尿病数据集。基于80:20比例的数据分析，随机森林算法从混淆矩阵的结果中获得了79%的准确率。从混淆矩阵结果中，得到的结果为AUC 0.835，召回率78%，精度90%。根据查全率(Recall)和查准率(Precision)的结果，得到了83%的F1分。而增强分类器算法从混淆矩阵的结果中获得的准确率为81%。从混淆矩阵结果来看，AUC结果为0.877，召回率为83%，精度为67%。根据查全率(Recall)和查准率(Precision)的结果，得到了74%的F1分。在本研究中，得到的精度评价结果是通过混淆矩阵和AUC值的结果。结果表明，与随机森林算法相比，梯度增强分类器算法具有更优异的准确率评价结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Sebatik

自引率

0.00%

发文量