Application of Random Forest for The Classification Diabetes Mellitus Disease in RSUP Dr. M. Jamil Padang

Fazhira Anisha, Dodi Vionanda, Nonong amalita, Zilrahmi
{"title":"Application of Random Forest for The Classification Diabetes Mellitus Disease in RSUP Dr. M. Jamil Padang","authors":"Fazhira Anisha, Dodi Vionanda, Nonong amalita, Zilrahmi","doi":"10.24036/ujsds/vol1-iss2/30","DOIUrl":null,"url":null,"abstract":"Diabetes Mellitus is a disease in which blood sugar levels go beyond normal (GDS>200 mg/dl). Diabetes Mellitus may be defined as an insulin function disorder in the pancreatic organ. Diabetes Mellitus is a world health problem as incidents of this disease are increasing in every part of the world, including Indonesia. Prevention and control of the disease need to be made so as not to cause complications in other organs even to death. Because of this, one needs to study a method to predict the occurance of this disease and to knows the variable that most affect a person suffered from it. This could be accomplished by using a classification methods. One of classification methods is Random Forest. In this case study using randomForest packages in RStudio software. In general, the result of this study are the smallest OOB’s error rates (%) and Variable Importance Measure (VIM) using Mean Decrease Accuracy (MDA) and Mean Decrease Gini (MDG) values.The classification by a Random Forest methods on the incidence of Diabetes Mellitus in RSUP Dr. M. Jamil Padang results in OOB’s error rate was 1,2% or accuracy rates was 98,8%. The most optimal model produced using mtry = 4 and ntree = 1000. If used MDA, the variables that most affect are Age, Polyphagia, Polyuria, HB, and BMI. While if used MDG, the variables that most affect are Age, Polyphagia, BMI, HB, and Delayed Healing.","PeriodicalId":220933,"journal":{"name":"UNP Journal of Statistics and Data Science","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"UNP Journal of Statistics and Data Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.24036/ujsds/vol1-iss2/30","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Diabetes Mellitus is a disease in which blood sugar levels go beyond normal (GDS>200 mg/dl). Diabetes Mellitus may be defined as an insulin function disorder in the pancreatic organ. Diabetes Mellitus is a world health problem as incidents of this disease are increasing in every part of the world, including Indonesia. Prevention and control of the disease need to be made so as not to cause complications in other organs even to death. Because of this, one needs to study a method to predict the occurance of this disease and to knows the variable that most affect a person suffered from it. This could be accomplished by using a classification methods. One of classification methods is Random Forest. In this case study using randomForest packages in RStudio software. In general, the result of this study are the smallest OOB’s error rates (%) and Variable Importance Measure (VIM) using Mean Decrease Accuracy (MDA) and Mean Decrease Gini (MDG) values.The classification by a Random Forest methods on the incidence of Diabetes Mellitus in RSUP Dr. M. Jamil Padang results in OOB’s error rate was 1,2% or accuracy rates was 98,8%. The most optimal model produced using mtry = 4 and ntree = 1000. If used MDA, the variables that most affect are Age, Polyphagia, Polyuria, HB, and BMI. While if used MDG, the variables that most affect are Age, Polyphagia, BMI, HB, and Delayed Healing.
随机森林在RSUP糖尿病疾病分类中的应用
糖尿病是一种血糖水平超过正常(GDS> 200mg /dl)的疾病。糖尿病可以定义为胰腺器官的胰岛素功能紊乱。糖尿病是一个世界卫生问题,因为这种疾病的发病率在包括印度尼西亚在内的世界各地都在增加。必须预防和控制该病,以免引起其他器官的并发症,甚至死亡。正因为如此,人们需要研究一种方法来预测这种疾病的发生,并知道最影响患者的变量。这可以通过使用分类方法来完成。其中一种分类方法是随机森林。在这个案例中,研究使用RStudio软件中的randomForest包。一般来说,本研究的结果是最小的OOB的错误率(%)和使用平均降低精度(MDA)和平均降低基尼(MDG)值的可变重要性度量(VIM)。M. Jamil Padang博士采用随机森林方法对RSUP患者糖尿病发病率进行分类,结果OOB的错误率为1.2%,准确率为98.8%。使用mtry = 4和ntree = 1000生成的最优模型。如果使用MDA,影响最大的变量是年龄、多食症、多尿症、HB和BMI。而如果使用MDG,影响最大的变量是年龄、多食症、BMI、HB和延迟愈合。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信