分析不同糖尿病数据集的糖尿病预测分类和特征选择策略。

IF 3 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Frontiers in Artificial Intelligence Pub Date : 2024-08-21 eCollection Date: 2024-01-01 DOI:10.3389/frai.2024.1421751
Jayakumar Kaliappan, I J Saravana Kumar, S Sundaravelan, T Anesh, R R Rithik, Yashbir Singh, Diana V Vera-Garcia, Yassine Himeur, Wathiq Mansoor, Shadi Atalla, Kathiravan Srinivasan
{"title":"分析不同糖尿病数据集的糖尿病预测分类和特征选择策略。","authors":"Jayakumar Kaliappan, I J Saravana Kumar, S Sundaravelan, T Anesh, R R Rithik, Yashbir Singh, Diana V Vera-Garcia, Yassine Himeur, Wathiq Mansoor, Shadi Atalla, Kathiravan Srinivasan","doi":"10.3389/frai.2024.1421751","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>In the evolving landscape of healthcare and medicine, the merging of extensive medical datasets with the powerful capabilities of machine learning (ML) models presents a significant opportunity for transforming diagnostics, treatments, and patient care.</p><p><strong>Methods: </strong>This research paper delves into the realm of data-driven healthcare, placing a special focus on identifying the most effective ML models for diabetes prediction and uncovering the critical features that aid in this prediction. The prediction performance is analyzed using a variety of ML models, such as Random Forest (RF), XG Boost (XGB), Linear Regression (LR), Gradient Boosting (GB), and Support VectorMachine (SVM), across numerousmedical datasets. The study of feature importance is conducted using methods including Filter-based, Wrapper-based techniques, and Explainable Artificial Intelligence (Explainable AI). By utilizing Explainable AI techniques, specifically Local Interpretable Model-agnostic Explanations (LIME) and SHapley Additive exPlanations (SHAP), the decision-making process of the models is ensured to be transparent, thereby bolstering trust in AI-driven decisions.</p><p><strong>Results: </strong>Features identified by RF in Wrapper-based techniques and the Chi-square in Filter-based techniques have been shown to enhance prediction performance. A notable precision and recall values, reaching up to 0.9 is achieved in predicting diabetes.</p><p><strong>Discussion: </strong>Both approaches are found to assign considerable importance to features like age, family history of diabetes, polyuria, polydipsia, and high blood pressure, which are strongly associated with diabetes. In this age of data-driven healthcare, the research presented here aspires to substantially improve healthcare outcomes.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":3.0000,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11371799/pdf/","citationCount":"0","resultStr":"{\"title\":\"Analyzing classification and feature selection strategies for diabetes prediction across diverse diabetes datasets.\",\"authors\":\"Jayakumar Kaliappan, I J Saravana Kumar, S Sundaravelan, T Anesh, R R Rithik, Yashbir Singh, Diana V Vera-Garcia, Yassine Himeur, Wathiq Mansoor, Shadi Atalla, Kathiravan Srinivasan\",\"doi\":\"10.3389/frai.2024.1421751\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Introduction: </strong>In the evolving landscape of healthcare and medicine, the merging of extensive medical datasets with the powerful capabilities of machine learning (ML) models presents a significant opportunity for transforming diagnostics, treatments, and patient care.</p><p><strong>Methods: </strong>This research paper delves into the realm of data-driven healthcare, placing a special focus on identifying the most effective ML models for diabetes prediction and uncovering the critical features that aid in this prediction. The prediction performance is analyzed using a variety of ML models, such as Random Forest (RF), XG Boost (XGB), Linear Regression (LR), Gradient Boosting (GB), and Support VectorMachine (SVM), across numerousmedical datasets. The study of feature importance is conducted using methods including Filter-based, Wrapper-based techniques, and Explainable Artificial Intelligence (Explainable AI). By utilizing Explainable AI techniques, specifically Local Interpretable Model-agnostic Explanations (LIME) and SHapley Additive exPlanations (SHAP), the decision-making process of the models is ensured to be transparent, thereby bolstering trust in AI-driven decisions.</p><p><strong>Results: </strong>Features identified by RF in Wrapper-based techniques and the Chi-square in Filter-based techniques have been shown to enhance prediction performance. A notable precision and recall values, reaching up to 0.9 is achieved in predicting diabetes.</p><p><strong>Discussion: </strong>Both approaches are found to assign considerable importance to features like age, family history of diabetes, polyuria, polydipsia, and high blood pressure, which are strongly associated with diabetes. In this age of data-driven healthcare, the research presented here aspires to substantially improve healthcare outcomes.</p>\",\"PeriodicalId\":33315,\"journal\":{\"name\":\"Frontiers in Artificial Intelligence\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2024-08-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11371799/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3389/frai.2024.1421751\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frai.2024.1421751","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

简介在不断发展的医疗保健领域,广泛的医疗数据集与机器学习(ML)模型的强大功能相结合,为诊断、治疗和患者护理的变革带来了重大机遇:本研究论文深入探讨了数据驱动的医疗保健领域,重点是确定最有效的糖尿病预测 ML 模型,并揭示有助于该预测的关键特征。在众多医疗数据集中,使用随机森林(RF)、XG Boost(XGB)、线性回归(LR)、梯度提升(GB)和支持向量机(SVM)等多种 ML 模型分析了预测性能。研究特征重要性的方法包括基于过滤器的技术、基于封装的技术和可解释人工智能(Explainable AI)。通过利用可解释的人工智能技术,特别是本地可解释模型-不可知解释(LIME)和SHAPLE Additive exPlanations(SHAP),确保了模型决策过程的透明性,从而增强了对人工智能驱动决策的信任:结果:在基于封装的技术中,通过射频识别的特征和在基于过滤器的技术中,通过奇偶校验识别的特征都能提高预测性能。在预测糖尿病方面,精确度和召回值显著提高,达到 0.9:这两种方法都非常重视年龄、糖尿病家族史、多尿、多饮和高血压等与糖尿病密切相关的特征。在这个数据驱动医疗保健的时代,本文介绍的研究有望大幅改善医疗保健结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Analyzing classification and feature selection strategies for diabetes prediction across diverse diabetes datasets.

Introduction: In the evolving landscape of healthcare and medicine, the merging of extensive medical datasets with the powerful capabilities of machine learning (ML) models presents a significant opportunity for transforming diagnostics, treatments, and patient care.

Methods: This research paper delves into the realm of data-driven healthcare, placing a special focus on identifying the most effective ML models for diabetes prediction and uncovering the critical features that aid in this prediction. The prediction performance is analyzed using a variety of ML models, such as Random Forest (RF), XG Boost (XGB), Linear Regression (LR), Gradient Boosting (GB), and Support VectorMachine (SVM), across numerousmedical datasets. The study of feature importance is conducted using methods including Filter-based, Wrapper-based techniques, and Explainable Artificial Intelligence (Explainable AI). By utilizing Explainable AI techniques, specifically Local Interpretable Model-agnostic Explanations (LIME) and SHapley Additive exPlanations (SHAP), the decision-making process of the models is ensured to be transparent, thereby bolstering trust in AI-driven decisions.

Results: Features identified by RF in Wrapper-based techniques and the Chi-square in Filter-based techniques have been shown to enhance prediction performance. A notable precision and recall values, reaching up to 0.9 is achieved in predicting diabetes.

Discussion: Both approaches are found to assign considerable importance to features like age, family history of diabetes, polyuria, polydipsia, and high blood pressure, which are strongly associated with diabetes. In this age of data-driven healthcare, the research presented here aspires to substantially improve healthcare outcomes.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
6.10
自引率
2.50%
发文量
272
审稿时长
13 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信