{"title":"A Comparative Study on Different Machine Learning Techniques in Diabetes Risk Assessment","authors":"Mahnur Akther, Zahara Rahman Chowdhury, Anika Tabassum, Md. Saidur Rahman Kohinoor","doi":"10.1109/CONIT59222.2023.10205382","DOIUrl":null,"url":null,"abstract":"Diabetes is a chronic disease in which the body’s ability to process glucose is impaired due to insufficient insulin production or utilization. This can result in high blood sugar levels and other health issues. Since the majority of our country’s population are not conscious about their lifestyle and are unaware of the complications and casualties that diabetes can lead to if it is not examined timely, assessing their risk for diabetes is an important step in prevention and early detection. Assessing one’s diabetes risk prior to medical diagnosis is crucial to enable timely intervention and management of the disease. Our paper intends to provide such a solution that would help detect the disease risk beforehand. We selected two diabetes datasets: PIMA Indian and Sylhet dataset for evaluating ten different models, namely-Support Vector Machine, Random Forest, Naive Bayes, Decision Tree, K Nearest Neighbor, Logistic Regression, Adaboost, Gradient Boost, XGBoost, and Multilayer perceptron. We applied Grid Search and Stratified K-fold cross-validation to asses model’s performance. Our goal is to determine the best-performing model for predicting diabetes. From the analysis, Random Forest outperformed in PIMA Indian Dataset with 84.21% accuracy and Gradient Boost outperformed in Sylhet dataset with 98.85% accuracy. So, for the two different featured datasets Random Forest and Gradient Boost scored highest as the best-predicting models.","PeriodicalId":377623,"journal":{"name":"2023 3rd International Conference on Intelligent Technologies (CONIT)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 3rd International Conference on Intelligent Technologies (CONIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CONIT59222.2023.10205382","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Diabetes is a chronic disease in which the body’s ability to process glucose is impaired due to insufficient insulin production or utilization. This can result in high blood sugar levels and other health issues. Since the majority of our country’s population are not conscious about their lifestyle and are unaware of the complications and casualties that diabetes can lead to if it is not examined timely, assessing their risk for diabetes is an important step in prevention and early detection. Assessing one’s diabetes risk prior to medical diagnosis is crucial to enable timely intervention and management of the disease. Our paper intends to provide such a solution that would help detect the disease risk beforehand. We selected two diabetes datasets: PIMA Indian and Sylhet dataset for evaluating ten different models, namely-Support Vector Machine, Random Forest, Naive Bayes, Decision Tree, K Nearest Neighbor, Logistic Regression, Adaboost, Gradient Boost, XGBoost, and Multilayer perceptron. We applied Grid Search and Stratified K-fold cross-validation to asses model’s performance. Our goal is to determine the best-performing model for predicting diabetes. From the analysis, Random Forest outperformed in PIMA Indian Dataset with 84.21% accuracy and Gradient Boost outperformed in Sylhet dataset with 98.85% accuracy. So, for the two different featured datasets Random Forest and Gradient Boost scored highest as the best-predicting models.