Explainable Machine Learning for Efficient Diabetes Prediction Using Hyperparameter Tuning, SHAP Analysis, Partial Dependency, and LIME

IF 1.8 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Md. Manowarul Islam, Habibur Rahman Rifat, Md. Shamim Bin Shahid, Arnisha Akhter, Md Ashraf Uddin, Khandaker Mohammad Mohi Uddin
{"title":"Explainable Machine Learning for Efficient Diabetes Prediction Using Hyperparameter Tuning, SHAP Analysis, Partial Dependency, and LIME","authors":"Md. Manowarul Islam,&nbsp;Habibur Rahman Rifat,&nbsp;Md. Shamim Bin Shahid,&nbsp;Arnisha Akhter,&nbsp;Md Ashraf Uddin,&nbsp;Khandaker Mohammad Mohi Uddin","doi":"10.1002/eng2.13080","DOIUrl":null,"url":null,"abstract":"<p>Diabetes is a chronic metabolic disease characterized by elevated blood glucose levels and poses significant health risks, such as cardiovascular disease and cognitive damage. Understanding the causes of diabetes is crucial to managing it and preventing complications. The clinical community has a lot of diabetes diagnostic data. Machine learning algorithms may simplify finding hidden patterns, retrieving data from databases, and predicting outcomes. To tackle the challenge of designing an improved diabetes classification algorithm that is more accurate, random oversampling and hyper-tuning parameter techniques have been used in this study. Whereas most of the existing methods were built upon considering any single dataset, for getting more acceptability in general, our proposed model has been designed based on two benchmark datasets: the BRFSS dataset, which has multiple classes, and the Diabetes 2019 dataset, which has binary classes. What is more, to improve the comprehensibility of the proposed model, a variety of explainability methodologies such as SHapley Additive Explanations (SHAP), Partial Dependency, and Local Interpretable Model-agnostic Explanations (LIME) have been implemented which are not often noticed in the previous works. The detailed explainability charts will enable the end users or practitioners to understand the exact factors of any given diagnostic report. This research focused on classifying type 2 diabetes using machine learning and providing an explanation for the outcomes derived from the model predictions. Random oversampling and quantile transform are used to rectify imbalances in the dataset and guarantee the resilience of model training. By meticulously adjusting parameters with gridsearchCV, we successfully optimized our models to attain exceptional accuracy across binary and multi-class datasets. We evaluate the proposed model using two datasets and performance metrics. The extra trees classifier (ET) performed exceptionally, achieving 97.23% accuracy on the multi-class dataset and 97.45% on the binary dataset.</p>","PeriodicalId":72922,"journal":{"name":"Engineering reports : open access","volume":"7 1","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/eng2.13080","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering reports : open access","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/eng2.13080","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Diabetes is a chronic metabolic disease characterized by elevated blood glucose levels and poses significant health risks, such as cardiovascular disease and cognitive damage. Understanding the causes of diabetes is crucial to managing it and preventing complications. The clinical community has a lot of diabetes diagnostic data. Machine learning algorithms may simplify finding hidden patterns, retrieving data from databases, and predicting outcomes. To tackle the challenge of designing an improved diabetes classification algorithm that is more accurate, random oversampling and hyper-tuning parameter techniques have been used in this study. Whereas most of the existing methods were built upon considering any single dataset, for getting more acceptability in general, our proposed model has been designed based on two benchmark datasets: the BRFSS dataset, which has multiple classes, and the Diabetes 2019 dataset, which has binary classes. What is more, to improve the comprehensibility of the proposed model, a variety of explainability methodologies such as SHapley Additive Explanations (SHAP), Partial Dependency, and Local Interpretable Model-agnostic Explanations (LIME) have been implemented which are not often noticed in the previous works. The detailed explainability charts will enable the end users or practitioners to understand the exact factors of any given diagnostic report. This research focused on classifying type 2 diabetes using machine learning and providing an explanation for the outcomes derived from the model predictions. Random oversampling and quantile transform are used to rectify imbalances in the dataset and guarantee the resilience of model training. By meticulously adjusting parameters with gridsearchCV, we successfully optimized our models to attain exceptional accuracy across binary and multi-class datasets. We evaluate the proposed model using two datasets and performance metrics. The extra trees classifier (ET) performed exceptionally, achieving 97.23% accuracy on the multi-class dataset and 97.45% on the binary dataset.

Abstract Image

求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
5.10
自引率
0.00%
发文量
0
审稿时长
19 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信