A Comparative Study on Different Machine Learning Techniques in Diabetes Risk Assessment

2023 3rd International Conference on Intelligent Technologies (CONIT) Pub Date : 2023-06-23 DOI:10.1109/CONIT59222.2023.10205382

Mahnur Akther, Zahara Rahman Chowdhury, Anika Tabassum, Md. Saidur Rahman Kohinoor

{"title":"A Comparative Study on Different Machine Learning Techniques in Diabetes Risk Assessment","authors":"Mahnur Akther, Zahara Rahman Chowdhury, Anika Tabassum, Md. Saidur Rahman Kohinoor","doi":"10.1109/CONIT59222.2023.10205382","DOIUrl":null,"url":null,"abstract":"Diabetes is a chronic disease in which the body’s ability to process glucose is impaired due to insufficient insulin production or utilization. This can result in high blood sugar levels and other health issues. Since the majority of our country’s population are not conscious about their lifestyle and are unaware of the complications and casualties that diabetes can lead to if it is not examined timely, assessing their risk for diabetes is an important step in prevention and early detection. Assessing one’s diabetes risk prior to medical diagnosis is crucial to enable timely intervention and management of the disease. Our paper intends to provide such a solution that would help detect the disease risk beforehand. We selected two diabetes datasets: PIMA Indian and Sylhet dataset for evaluating ten different models, namely-Support Vector Machine, Random Forest, Naive Bayes, Decision Tree, K Nearest Neighbor, Logistic Regression, Adaboost, Gradient Boost, XGBoost, and Multilayer perceptron. We applied Grid Search and Stratified K-fold cross-validation to asses model’s performance. Our goal is to determine the best-performing model for predicting diabetes. From the analysis, Random Forest outperformed in PIMA Indian Dataset with 84.21% accuracy and Gradient Boost outperformed in Sylhet dataset with 98.85% accuracy. So, for the two different featured datasets Random Forest and Gradient Boost scored highest as the best-predicting models.","PeriodicalId":377623,"journal":{"name":"2023 3rd International Conference on Intelligent Technologies (CONIT)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 3rd International Conference on Intelligent Technologies (CONIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CONIT59222.2023.10205382","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Diabetes is a chronic disease in which the body’s ability to process glucose is impaired due to insufficient insulin production or utilization. This can result in high blood sugar levels and other health issues. Since the majority of our country’s population are not conscious about their lifestyle and are unaware of the complications and casualties that diabetes can lead to if it is not examined timely, assessing their risk for diabetes is an important step in prevention and early detection. Assessing one’s diabetes risk prior to medical diagnosis is crucial to enable timely intervention and management of the disease. Our paper intends to provide such a solution that would help detect the disease risk beforehand. We selected two diabetes datasets: PIMA Indian and Sylhet dataset for evaluating ten different models, namely-Support Vector Machine, Random Forest, Naive Bayes, Decision Tree, K Nearest Neighbor, Logistic Regression, Adaboost, Gradient Boost, XGBoost, and Multilayer perceptron. We applied Grid Search and Stratified K-fold cross-validation to asses model’s performance. Our goal is to determine the best-performing model for predicting diabetes. From the analysis, Random Forest outperformed in PIMA Indian Dataset with 84.21% accuracy and Gradient Boost outperformed in Sylhet dataset with 98.85% accuracy. So, for the two different featured datasets Random Forest and Gradient Boost scored highest as the best-predicting models.

查看原文本刊更多论文

不同机器学习技术在糖尿病风险评估中的比较研究

糖尿病是一种慢性疾病，由于胰岛素产生或利用不足，身体处理葡萄糖的能力受损。这会导致高血糖和其他健康问题。由于我国大多数人口没有意识到他们的生活方式，也没有意识到如果不及时检查糖尿病可能导致的并发症和伤亡，因此评估他们患糖尿病的风险是预防和早期发现的重要一步。在医学诊断之前评估一个人的糖尿病风险对于能够及时干预和管理疾病至关重要。我们的论文旨在提供这样一个解决方案，将有助于提前发现疾病的风险。我们选择了两个糖尿病数据集:PIMA Indian和Sylhet数据集来评估10种不同的模型，即支持向量机、随机森林、朴素贝叶斯、决策树、K近邻、逻辑回归、Adaboost、梯度Boost、XGBoost和多层感知器。我们使用网格搜索和分层K-fold交叉验证来评估模型的性能。我们的目标是确定预测糖尿病的最佳模型。从分析结果来看，Random Forest在PIMA Indian Dataset上的准确率为84.21%，Gradient Boost在Sylhet Dataset上的准确率为98.85%。因此，对于两个不同的特征数据集，随机森林和梯度增强作为最佳预测模型得分最高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 3rd International Conference on Intelligent Technologies (CONIT)

自引率

0.00%

发文量