Divya Kaur Bhullar, Natassha Shievanie Selvaraj, Fung Teng Choong, Chen Wan Jing, K. Xiaoxi, D. Handayani, N. Hamzah, M. Lubis, T. Mantoro
{"title":"Developing a Predictive Supervised Machine Learning Models for Diabetes","authors":"Divya Kaur Bhullar, Natassha Shievanie Selvaraj, Fung Teng Choong, Chen Wan Jing, K. Xiaoxi, D. Handayani, N. Hamzah, M. Lubis, T. Mantoro","doi":"10.1109/ICCED53389.2021.9664833","DOIUrl":null,"url":null,"abstract":"The growing number of diabetes cases today are often diagnosed late or even goes unnoticed altogether until it is in a later stage. One of the dominant explanations for this trend is the scarcity of prediction tools and techniques for this disease. Previous research has demonstrated that early prediction of diabetes can lower the risks of major health implications and increase the possibility of making improved treatment decisions for patients. This study attempts to design a model to predict diabetes based on patient’s risk factors and lifestyles. We use data from the National Institute of Diabetes and Digestive and Kidney Diseases to visualise data to understand correlations between 9 variables. We then perform data mining using Logistic Regression, Random Forests and Decision Tree to compare the best performance in accuracy and F1-score. Our findings indicate that the prediction model using the Random Forrest classifier algorithm has the highest accuracy percentage of 79.4% in predicting diabetes compared to the other two classifier algorithms.","PeriodicalId":6800,"journal":{"name":"2021 IEEE 7th International Conference on Computing, Engineering and Design (ICCED)","volume":"1 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 7th International Conference on Computing, Engineering and Design (ICCED)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCED53389.2021.9664833","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The growing number of diabetes cases today are often diagnosed late or even goes unnoticed altogether until it is in a later stage. One of the dominant explanations for this trend is the scarcity of prediction tools and techniques for this disease. Previous research has demonstrated that early prediction of diabetes can lower the risks of major health implications and increase the possibility of making improved treatment decisions for patients. This study attempts to design a model to predict diabetes based on patient’s risk factors and lifestyles. We use data from the National Institute of Diabetes and Digestive and Kidney Diseases to visualise data to understand correlations between 9 variables. We then perform data mining using Logistic Regression, Random Forests and Decision Tree to compare the best performance in accuracy and F1-score. Our findings indicate that the prediction model using the Random Forrest classifier algorithm has the highest accuracy percentage of 79.4% in predicting diabetes compared to the other two classifier algorithms.