{"title":"COMPARATIVE ANALYSIS OF ACCURACY OF RANDOM FOREST AND GRADIENT BOOSTING CLASSIFIER ALGORITHM FOR DIABETES CLASSIFICATION","authors":"Sahat Pandapotan Nainggolan, Ardiles Sinaga","doi":"10.46984/sebatik.v27i1.2157","DOIUrl":null,"url":null,"abstract":"Diabetes is a disease characterized by high blood sugar (glucose) levels. If blood sugar is not controlled properly, it can cause various critical diseases, one of which is diabetes. The purpose of this study was to determine the results of a comparison of the accuracy values of the Random Forest Algorithm and the Gradient Boosting Classifier Algorithm in the classification of diabetes which will be tested for accuracy, Precision, Recall, and F1 score performance. The method used in this study was descriptive and the data source used the Pima Indians Diabetes Dataset from Kaggle. Based on data analysis using a ratio of 80:20, the Random Forest Algorithm has an accuracy of 79% obtained from the results of the confusion matrix. From the confusion matrix results, the results obtained were AUC 0.835, Recall 78%, and Precision 90%. Based on the results of Recall and Precision, an F1 score of 83% was obtained. Whereas the Boosting Classifier Algorithm has an accuracy result obtained from the results of the confusion matrix which is 81%. From the confusion matrix results, the AUC results were 0.877, Recall 83%, and Precision 67%. Based on the results of Recall and Precision, an F1 score of 74% was obtained. In this study, the accuracy evaluation results obtained were through the results of the Confusion matrix and the AUC value. These results indicate that the Gradient Boosting Classifier Algorithm has a more excellent accuracy evaluation result compared to the Random Forest Algorithm.","PeriodicalId":493984,"journal":{"name":"Sebatik","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sebatik","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.46984/sebatik.v27i1.2157","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Diabetes is a disease characterized by high blood sugar (glucose) levels. If blood sugar is not controlled properly, it can cause various critical diseases, one of which is diabetes. The purpose of this study was to determine the results of a comparison of the accuracy values of the Random Forest Algorithm and the Gradient Boosting Classifier Algorithm in the classification of diabetes which will be tested for accuracy, Precision, Recall, and F1 score performance. The method used in this study was descriptive and the data source used the Pima Indians Diabetes Dataset from Kaggle. Based on data analysis using a ratio of 80:20, the Random Forest Algorithm has an accuracy of 79% obtained from the results of the confusion matrix. From the confusion matrix results, the results obtained were AUC 0.835, Recall 78%, and Precision 90%. Based on the results of Recall and Precision, an F1 score of 83% was obtained. Whereas the Boosting Classifier Algorithm has an accuracy result obtained from the results of the confusion matrix which is 81%. From the confusion matrix results, the AUC results were 0.877, Recall 83%, and Precision 67%. Based on the results of Recall and Precision, an F1 score of 74% was obtained. In this study, the accuracy evaluation results obtained were through the results of the Confusion matrix and the AUC value. These results indicate that the Gradient Boosting Classifier Algorithm has a more excellent accuracy evaluation result compared to the Random Forest Algorithm.