{"title":"Comparative Approach for Early Diabetes Detection with Machine Learning","authors":"Shilpi Harnal, Arpit Jain, Anshika, Anurita Singh Rathore, Vidhu Baggan, Gagandeep Kaur, Rajni Bala","doi":"10.1109/ESCI56872.2023.10100186","DOIUrl":null,"url":null,"abstract":"The detrimental effects of diabetes are currently affecting a sizeable section of the population worldwide, and many of these individuals are not being properly diagnosed. This could eventually lead to significant health issues like kidney failure and vision blindness. Chances of heart attacks and strokes increase by two to three times due to diabetes. Thus, this work has considered a total of 520 instances with included 17 features such as polyuria, gender, age, sudden weight loss, polydipsia, polyphagia, weakness, irritability, genital thrush, itching, vision blurring, muscle stiffness, alopecia, delayed healing, delayed healing, and obesity to classify the type of diabetes at an early stage to avoid such risk. Various Machine Learning (ML) methods can be employed to accurately classify the disease. The objective of this research is to predict diabetes with the help of a variety of machine learning (ML) methods and to identify the most efficient model with the highest accuracy. A total 8 classification algorithms are used for the performance measurement, these are Support Vector Classifier (SVC), Gaussian Naive Bayes (GNB), Random Forest (RF), Decision Tree Classifier (DTC), Logistic Regression (LR), Extra Tree Classifier (ETC), K-Nearest Neighbors (KNN), and XGBoost (XGB) because these models gave the highest accuracy for this dataset. After comparative analysis, the results present that Extra Tree Classifier (ETC) has the highest accuracy, i.e., 98.55%, and can be considered the best and efficient ML classification technique for diagnosing diabetes based on mentioned parameters.","PeriodicalId":441215,"journal":{"name":"2023 International Conference on Emerging Smart Computing and Informatics (ESCI)","volume":"127 12","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Emerging Smart Computing and Informatics (ESCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ESCI56872.2023.10100186","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The detrimental effects of diabetes are currently affecting a sizeable section of the population worldwide, and many of these individuals are not being properly diagnosed. This could eventually lead to significant health issues like kidney failure and vision blindness. Chances of heart attacks and strokes increase by two to three times due to diabetes. Thus, this work has considered a total of 520 instances with included 17 features such as polyuria, gender, age, sudden weight loss, polydipsia, polyphagia, weakness, irritability, genital thrush, itching, vision blurring, muscle stiffness, alopecia, delayed healing, delayed healing, and obesity to classify the type of diabetes at an early stage to avoid such risk. Various Machine Learning (ML) methods can be employed to accurately classify the disease. The objective of this research is to predict diabetes with the help of a variety of machine learning (ML) methods and to identify the most efficient model with the highest accuracy. A total 8 classification algorithms are used for the performance measurement, these are Support Vector Classifier (SVC), Gaussian Naive Bayes (GNB), Random Forest (RF), Decision Tree Classifier (DTC), Logistic Regression (LR), Extra Tree Classifier (ETC), K-Nearest Neighbors (KNN), and XGBoost (XGB) because these models gave the highest accuracy for this dataset. After comparative analysis, the results present that Extra Tree Classifier (ETC) has the highest accuracy, i.e., 98.55%, and can be considered the best and efficient ML classification technique for diagnosing diabetes based on mentioned parameters.
糖尿病的有害影响目前正在影响全球相当一部分人口,其中许多人没有得到适当的诊断。这最终可能导致严重的健康问题,如肾衰竭和视力失明。由于糖尿病,心脏病发作和中风的几率增加了两到三倍。因此,本研究共考虑了520例患者,包括17个特征,如多尿、性别、年龄、体重突然减轻、多饮、多食、虚弱、易怒、生殖器鹅口疮、瘙痒、视力模糊、肌肉僵硬、脱发、延迟愈合、延迟愈合和肥胖,以便在早期对糖尿病进行类型分类,以避免此类风险。可以使用各种机器学习(ML)方法来准确分类疾病。本研究的目的是借助各种机器学习(ML)方法来预测糖尿病,并以最高的准确性确定最有效的模型。总共有8种分类算法用于性能测量,它们是支持向量分类器(SVC)、高斯朴素贝叶斯(GNB)、随机森林(RF)、决策树分类器(DTC)、逻辑回归(LR)、额外树分类器(ETC)、k近邻(KNN)和XGBoost (XGB),因为这些模型为该数据集提供了最高的精度。通过对比分析,结果表明Extra Tree Classifier (ETC)的准确率最高,达到98.55%,可以认为是基于上述参数诊断糖尿病的最佳和有效的ML分类技术。