Inam Abousaber, Haitham F Abdallah, Hany El-Ghaish
{"title":"Robust predictive framework for diabetes classification using optimized machine learning on imbalanced datasets.","authors":"Inam Abousaber, Haitham F Abdallah, Hany El-Ghaish","doi":"10.3389/frai.2024.1499530","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Diabetes prediction using clinical datasets is crucial for medical data analysis. However, class imbalances, where non-diabetic cases dominate, can significantly affect machine learning model performance, leading to biased predictions and reduced generalization.</p><p><strong>Methods: </strong>A novel predictive framework employing cutting-edge machine learning algorithms and advanced imbalance handling techniques was developed. The framework integrates feature engineering and resampling strategies to enhance predictive accuracy.</p><p><strong>Results: </strong>Rigorous testing was conducted on three datasets-PIMA, Diabetes Dataset 2019, and BIT_2019-demonstrating the robustness and adaptability of the methodology across varying data environments.</p><p><strong>Discussion: </strong>The experimental results highlight the critical role of model selection and imbalance mitigation in achieving reliable and generalizable diabetes predictions. This study offers significant contributions to medical informatics by proposing a robust data-driven framework that addresses class imbalance challenges, thereby advancing diabetes prediction accuracy.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"7 ","pages":"1499530"},"PeriodicalIF":3.0000,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11747138/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frai.2024.1499530","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: Diabetes prediction using clinical datasets is crucial for medical data analysis. However, class imbalances, where non-diabetic cases dominate, can significantly affect machine learning model performance, leading to biased predictions and reduced generalization.
Methods: A novel predictive framework employing cutting-edge machine learning algorithms and advanced imbalance handling techniques was developed. The framework integrates feature engineering and resampling strategies to enhance predictive accuracy.
Results: Rigorous testing was conducted on three datasets-PIMA, Diabetes Dataset 2019, and BIT_2019-demonstrating the robustness and adaptability of the methodology across varying data environments.
Discussion: The experimental results highlight the critical role of model selection and imbalance mitigation in achieving reliable and generalizable diabetes predictions. This study offers significant contributions to medical informatics by proposing a robust data-driven framework that addresses class imbalance challenges, thereby advancing diabetes prediction accuracy.