Developing a Predictive Supervised Machine Learning Models for Diabetes

2021 IEEE 7th International Conference on Computing, Engineering and Design (ICCED) Pub Date : 2021-08-05 DOI:10.1109/ICCED53389.2021.9664833

Divya Kaur Bhullar, Natassha Shievanie Selvaraj, Fung Teng Choong, Chen Wan Jing, K. Xiaoxi, D. Handayani, N. Hamzah, M. Lubis, T. Mantoro

{"title":"Developing a Predictive Supervised Machine Learning Models for Diabetes","authors":"Divya Kaur Bhullar, Natassha Shievanie Selvaraj, Fung Teng Choong, Chen Wan Jing, K. Xiaoxi, D. Handayani, N. Hamzah, M. Lubis, T. Mantoro","doi":"10.1109/ICCED53389.2021.9664833","DOIUrl":null,"url":null,"abstract":"The growing number of diabetes cases today are often diagnosed late or even goes unnoticed altogether until it is in a later stage. One of the dominant explanations for this trend is the scarcity of prediction tools and techniques for this disease. Previous research has demonstrated that early prediction of diabetes can lower the risks of major health implications and increase the possibility of making improved treatment decisions for patients. This study attempts to design a model to predict diabetes based on patient’s risk factors and lifestyles. We use data from the National Institute of Diabetes and Digestive and Kidney Diseases to visualise data to understand correlations between 9 variables. We then perform data mining using Logistic Regression, Random Forests and Decision Tree to compare the best performance in accuracy and F1-score. Our findings indicate that the prediction model using the Random Forrest classifier algorithm has the highest accuracy percentage of 79.4% in predicting diabetes compared to the other two classifier algorithms.","PeriodicalId":6800,"journal":{"name":"2021 IEEE 7th International Conference on Computing, Engineering and Design (ICCED)","volume":"1 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 7th International Conference on Computing, Engineering and Design (ICCED)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCED53389.2021.9664833","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The growing number of diabetes cases today are often diagnosed late or even goes unnoticed altogether until it is in a later stage. One of the dominant explanations for this trend is the scarcity of prediction tools and techniques for this disease. Previous research has demonstrated that early prediction of diabetes can lower the risks of major health implications and increase the possibility of making improved treatment decisions for patients. This study attempts to design a model to predict diabetes based on patient’s risk factors and lifestyles. We use data from the National Institute of Diabetes and Digestive and Kidney Diseases to visualise data to understand correlations between 9 variables. We then perform data mining using Logistic Regression, Random Forests and Decision Tree to compare the best performance in accuracy and F1-score. Our findings indicate that the prediction model using the Random Forrest classifier algorithm has the highest accuracy percentage of 79.4% in predicting diabetes compared to the other two classifier algorithms.

查看原文本刊更多论文

糖尿病预测监督机器学习模型的开发

今天，越来越多的糖尿病病例往往被诊断得很晚，甚至完全被忽视，直到它处于后期阶段。对这一趋势的主要解释之一是缺乏这种疾病的预测工具和技术。先前的研究表明，糖尿病的早期预测可以降低重大健康影响的风险，并增加为患者做出改进治疗决策的可能性。本研究试图设计一个基于患者危险因素和生活方式的糖尿病预测模型。我们使用国家糖尿病、消化和肾脏疾病研究所的数据来可视化数据，以了解9个变量之间的相关性。然后，我们使用逻辑回归，随机森林和决策树进行数据挖掘，以比较准确性和f1得分的最佳表现。我们的研究结果表明，与其他两种分类器算法相比，使用Random Forrest分类器算法的预测模型预测糖尿病的准确率最高，为79.4%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE 7th International Conference on Computing, Engineering and Design (ICCED)

自引率

0.00%

发文量