Robust predictive framework for diabetes classification using optimized machine learning on imbalanced datasets.

IF 3 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Frontiers in Artificial Intelligence Pub Date : 2025-01-07 eCollection Date: 2024-01-01 DOI:10.3389/frai.2024.1499530
Inam Abousaber, Haitham F Abdallah, Hany El-Ghaish
{"title":"Robust predictive framework for diabetes classification using optimized machine learning on imbalanced datasets.","authors":"Inam Abousaber, Haitham F Abdallah, Hany El-Ghaish","doi":"10.3389/frai.2024.1499530","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Diabetes prediction using clinical datasets is crucial for medical data analysis. However, class imbalances, where non-diabetic cases dominate, can significantly affect machine learning model performance, leading to biased predictions and reduced generalization.</p><p><strong>Methods: </strong>A novel predictive framework employing cutting-edge machine learning algorithms and advanced imbalance handling techniques was developed. The framework integrates feature engineering and resampling strategies to enhance predictive accuracy.</p><p><strong>Results: </strong>Rigorous testing was conducted on three datasets-PIMA, Diabetes Dataset 2019, and BIT_2019-demonstrating the robustness and adaptability of the methodology across varying data environments.</p><p><strong>Discussion: </strong>The experimental results highlight the critical role of model selection and imbalance mitigation in achieving reliable and generalizable diabetes predictions. This study offers significant contributions to medical informatics by proposing a robust data-driven framework that addresses class imbalance challenges, thereby advancing diabetes prediction accuracy.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"7 ","pages":"1499530"},"PeriodicalIF":3.0000,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11747138/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frai.2024.1499530","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction: Diabetes prediction using clinical datasets is crucial for medical data analysis. However, class imbalances, where non-diabetic cases dominate, can significantly affect machine learning model performance, leading to biased predictions and reduced generalization.

Methods: A novel predictive framework employing cutting-edge machine learning algorithms and advanced imbalance handling techniques was developed. The framework integrates feature engineering and resampling strategies to enhance predictive accuracy.

Results: Rigorous testing was conducted on three datasets-PIMA, Diabetes Dataset 2019, and BIT_2019-demonstrating the robustness and adaptability of the methodology across varying data environments.

Discussion: The experimental results highlight the critical role of model selection and imbalance mitigation in achieving reliable and generalizable diabetes predictions. This study offers significant contributions to medical informatics by proposing a robust data-driven framework that addresses class imbalance challenges, thereby advancing diabetes prediction accuracy.

在不平衡数据集上使用优化机器学习的稳健预测糖尿病分类框架。
利用临床数据集进行糖尿病预测对医疗数据分析至关重要。然而,非糖尿病病例占主导地位的类别不平衡会显著影响机器学习模型的性能,导致有偏见的预测和降低泛化。方法:采用先进的机器学习算法和先进的不平衡处理技术,开发了一个新的预测框架。该框架集成了特征工程和重采样策略,以提高预测精度。结果:在三个数据集(pima、Diabetes Dataset 2019和bit_2019)上进行了严格的测试,证明了该方法在不同数据环境下的稳健性和适应性。讨论:实验结果强调了模型选择和失衡缓解在实现可靠和可推广的糖尿病预测中的关键作用。本研究通过提出一个健壮的数据驱动框架来解决类别不平衡的挑战,从而提高糖尿病预测的准确性,为医学信息学提供了重大贡献。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
6.10
自引率
2.50%
发文量
272
审稿时长
13 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信