Johannes B Ginting, Tri Suci, Chrismis N Ginting, Ermi Girsang
{"title":"Early detection system of risk factors for diabetes mellitus type 2 utilization of machine learning-random forest.","authors":"Johannes B Ginting, Tri Suci, Chrismis N Ginting, Ermi Girsang","doi":"10.4103/jfcm.jfcm_33_23","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The prevalence of morbidity and mortality for type 2 diabetes mellitus (DM) is still increasing because of changing lifestyles. There needs to be a means of controlling the rise in the incidence of the disease. Many researchers have utilized technological advances such as machine learning for disease prevention and control, especially in noncommunicable conditions. Researchers are, therefore, interested in creating an early detection system for risk factors of type 2 diabetes.</p><p><strong>Materials and methods: </strong>The study was conducted in February 2022, utilizing secondary surveillance data from Puskesmas Johar Baru, Jakarta, in 2019, 2020, and 2021. Data was analyzed utilizing various bivariate and multivariate statistical methods at 5% significance level and machine learning methods (random forest algorithm) with an accuracy rate of >80%. The data for the three years was cleaned, normalized, and merged.</p><p><strong>Results: </strong>The final population was 65,533 visits out of the initial data of 196,949, and the final number of DM 2 population was 2766 out of the initial data of 9903. Age, gender, family history of DM, family history of hypertension, hypertension, high blood sugar levels, obesity, and central obesity were significantly associated with type 2 DM. Family history was the strongest risk factor of all independent variables, odds ratio of 15.101. The classification results of feature importance, with an accuracy rate of 84%, obtained in order were age, blood sugar level, and body mass index.</p><p><strong>Conclusion: </strong>Blood sugar level is the most influential factor in the incidence of DM in Puskesmas Johar Baru. In other words, a person with a family history of type 2 diabetes, at unproductive age, of female gender, and of excessive weight can avoid type 2 diabetes if they can regularly maintain their blood sugar levels.</p>","PeriodicalId":46862,"journal":{"name":"Journal of Family and Community Medicine","volume":"30 3","pages":"171-179"},"PeriodicalIF":1.9000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/09/8b/JFCM-30-171.PMC10479022.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Family and Community Medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4103/jfcm.jfcm_33_23","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/7/24 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0
Abstract
Background: The prevalence of morbidity and mortality for type 2 diabetes mellitus (DM) is still increasing because of changing lifestyles. There needs to be a means of controlling the rise in the incidence of the disease. Many researchers have utilized technological advances such as machine learning for disease prevention and control, especially in noncommunicable conditions. Researchers are, therefore, interested in creating an early detection system for risk factors of type 2 diabetes.
Materials and methods: The study was conducted in February 2022, utilizing secondary surveillance data from Puskesmas Johar Baru, Jakarta, in 2019, 2020, and 2021. Data was analyzed utilizing various bivariate and multivariate statistical methods at 5% significance level and machine learning methods (random forest algorithm) with an accuracy rate of >80%. The data for the three years was cleaned, normalized, and merged.
Results: The final population was 65,533 visits out of the initial data of 196,949, and the final number of DM 2 population was 2766 out of the initial data of 9903. Age, gender, family history of DM, family history of hypertension, hypertension, high blood sugar levels, obesity, and central obesity were significantly associated with type 2 DM. Family history was the strongest risk factor of all independent variables, odds ratio of 15.101. The classification results of feature importance, with an accuracy rate of 84%, obtained in order were age, blood sugar level, and body mass index.
Conclusion: Blood sugar level is the most influential factor in the incidence of DM in Puskesmas Johar Baru. In other words, a person with a family history of type 2 diabetes, at unproductive age, of female gender, and of excessive weight can avoid type 2 diabetes if they can regularly maintain their blood sugar levels.
背景:由于生活方式的改变,2型糖尿病(DM)的发病率和死亡率仍在增加。需要有一种方法来控制这种疾病发病率的上升。许多研究人员利用机器学习等技术进步来预防和控制疾病,尤其是在非传染性疾病中。因此,研究人员有兴趣创建一个2型糖尿病风险因素的早期检测系统。材料和方法:该研究于2022年2月进行,利用2019年、2020年和2021年雅加达Johar Baru Puskesmas的二次监测数据。使用各种双变量和多变量统计方法以5%的显著性水平分析数据,并使用机器学习方法(随机森林算法)以>80%的准确率分析数据。对这三年的数据进行了清理、规范化和合并。结果:在196949的初始数据中,最终人群为65533次就诊,而在9903的原始数据中,DM 2人群为2766次就诊。年龄、性别、糖尿病家族史、高血压家族史、高血糖、肥胖和中心性肥胖与2型糖尿病显著相关。家族史是所有自变量中最强的危险因素,优势比为15.101。特征重要性的分类结果按年龄、血糖水平和体重指数排序,准确率为84%。结论:血糖水平是影响Johar Baru Puskesmass DM发病率的主要因素。换言之,有2型糖尿病家族史、处于非生产年龄、女性和超重的人,如果能够定期保持血糖水平,就可以避免2型糖尿病。