利用机器学习和可解释的人工智能技术预测糖尿病

IF 2.8 Q3 ENGINEERING, BIOMEDICAL
Isfafuzzaman Tasin, Tansin Ullah Nabil, Sanjida Islam, Riasat Khan
{"title":"利用机器学习和可解释的人工智能技术预测糖尿病","authors":"Isfafuzzaman Tasin,&nbsp;Tansin Ullah Nabil,&nbsp;Sanjida Islam,&nbsp;Riasat Khan","doi":"10.1049/htl2.12039","DOIUrl":null,"url":null,"abstract":"<p>Globally, diabetes affects 537 million people, making it the deadliest and the most common non-communicable disease. Many factors can cause a person to get affected by diabetes, like excessive body weight, abnormal cholesterol level, family history, physical inactivity, bad food habit etc. Increased urination is one of the most common symptoms of this disease. People with diabetes for a long time can get several complications like heart disorder, kidney disease, nerve damage, diabetic retinopathy etc. But its risk can be reduced if it is predicted early. In this paper, an automatic diabetes prediction system has been developed using a private dataset of female patients in Bangladesh and various machine learning techniques. The authors used the Pima Indian diabetes dataset and collected additional samples from 203 individuals from a local textile factory in Bangladesh. Feature selection algorithm mutual information has been applied in this work. A semi-supervised model with extreme gradient boosting has been utilized to predict the insulin features of the private dataset. SMOTE and ADASYN approaches have been employed to manage the class imbalance problem. The authors used machine learning classification methods, that is, decision tree, SVM, Random Forest, Logistic Regression, KNN, and various ensemble techniques, to determine which algorithm produces the best prediction results. After training on and testing all the classification models, the proposed system provided the best result in the XGBoost classifier with the ADASYN approach with 81% accuracy, 0.81 F1 coefficient and AUC of 0.84. Furthermore, the domain adaptation method has been implemented to demonstrate the versatility of the proposed system. The explainable AI approach with LIME and SHAP frameworks is implemented to understand how the model predicts the final results. Finally, a website framework and an Android smartphone application have been developed to input various features and predict diabetes instantaneously. The private dataset of female Bangladeshi patients and programming codes are available at the following link: https://github.com/tansin-nabil/Diabetes-Prediction-Using-Machine-Learning.</p>","PeriodicalId":37474,"journal":{"name":"Healthcare Technology Letters","volume":"10 1-2","pages":"1-10"},"PeriodicalIF":2.8000,"publicationDate":"2022-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/htl2.12039","citationCount":"14","resultStr":"{\"title\":\"Diabetes prediction using machine learning and explainable AI techniques\",\"authors\":\"Isfafuzzaman Tasin,&nbsp;Tansin Ullah Nabil,&nbsp;Sanjida Islam,&nbsp;Riasat Khan\",\"doi\":\"10.1049/htl2.12039\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Globally, diabetes affects 537 million people, making it the deadliest and the most common non-communicable disease. Many factors can cause a person to get affected by diabetes, like excessive body weight, abnormal cholesterol level, family history, physical inactivity, bad food habit etc. Increased urination is one of the most common symptoms of this disease. People with diabetes for a long time can get several complications like heart disorder, kidney disease, nerve damage, diabetic retinopathy etc. But its risk can be reduced if it is predicted early. In this paper, an automatic diabetes prediction system has been developed using a private dataset of female patients in Bangladesh and various machine learning techniques. The authors used the Pima Indian diabetes dataset and collected additional samples from 203 individuals from a local textile factory in Bangladesh. Feature selection algorithm mutual information has been applied in this work. A semi-supervised model with extreme gradient boosting has been utilized to predict the insulin features of the private dataset. SMOTE and ADASYN approaches have been employed to manage the class imbalance problem. The authors used machine learning classification methods, that is, decision tree, SVM, Random Forest, Logistic Regression, KNN, and various ensemble techniques, to determine which algorithm produces the best prediction results. After training on and testing all the classification models, the proposed system provided the best result in the XGBoost classifier with the ADASYN approach with 81% accuracy, 0.81 F1 coefficient and AUC of 0.84. Furthermore, the domain adaptation method has been implemented to demonstrate the versatility of the proposed system. The explainable AI approach with LIME and SHAP frameworks is implemented to understand how the model predicts the final results. Finally, a website framework and an Android smartphone application have been developed to input various features and predict diabetes instantaneously. The private dataset of female Bangladeshi patients and programming codes are available at the following link: https://github.com/tansin-nabil/Diabetes-Prediction-Using-Machine-Learning.</p>\",\"PeriodicalId\":37474,\"journal\":{\"name\":\"Healthcare Technology Letters\",\"volume\":\"10 1-2\",\"pages\":\"1-10\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2022-12-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/htl2.12039\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Healthcare Technology Letters\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1049/htl2.12039\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENGINEERING, BIOMEDICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Healthcare Technology Letters","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/htl2.12039","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
引用次数: 14

摘要

在全球范围内,糖尿病影响着5.37亿人,使其成为最致命和最常见的非传染性疾病。导致糖尿病的因素有很多,比如体重超标、胆固醇水平异常、家族史、缺乏运动、不良饮食习惯等。排尿增多是本病最常见的症状之一。长期患有糖尿病的人可能会出现一些并发症,如心脏病、肾病、神经损伤、糖尿病视网膜病变等。但如果及早预测,其风险是可以降低的。在本文中,使用孟加拉国女性患者的私人数据集和各种机器学习技术开发了一个自动糖尿病预测系统。作者使用了皮马印第安人糖尿病数据集,并从孟加拉国当地一家纺织厂的203个人中收集了额外的样本。特征选择算法互信息被应用到这项工作中。利用极端梯度增强的半监督模型来预测私有数据集的胰岛素特征。我们采用了SMOTE和ADASYN方法来管理类失衡问题。作者使用机器学习分类方法,即决策树、支持向量机、随机森林、逻辑回归、KNN和各种集成技术,来确定哪种算法产生最好的预测结果。经过对所有分类模型的训练和测试,该系统在ADASYN方法的XGBoost分类器中提供了最好的结果,准确率为81%,F1系数为0.81,AUC为0.84。此外,还实现了领域自适应方法,以证明所提出系统的通用性。使用LIME和SHAP框架实现可解释的AI方法,以了解模型如何预测最终结果。最后,开发了一个网站框架和一个Android智能手机应用程序,用于输入各种特征并即时预测糖尿病。孟加拉国女性患者的私人数据集和编程代码可在以下链接获得:https://github.com/tansin-nabil/Diabetes-Prediction-Using-Machine-Learning。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Diabetes prediction using machine learning and explainable AI techniques

Diabetes prediction using machine learning and explainable AI techniques

Globally, diabetes affects 537 million people, making it the deadliest and the most common non-communicable disease. Many factors can cause a person to get affected by diabetes, like excessive body weight, abnormal cholesterol level, family history, physical inactivity, bad food habit etc. Increased urination is one of the most common symptoms of this disease. People with diabetes for a long time can get several complications like heart disorder, kidney disease, nerve damage, diabetic retinopathy etc. But its risk can be reduced if it is predicted early. In this paper, an automatic diabetes prediction system has been developed using a private dataset of female patients in Bangladesh and various machine learning techniques. The authors used the Pima Indian diabetes dataset and collected additional samples from 203 individuals from a local textile factory in Bangladesh. Feature selection algorithm mutual information has been applied in this work. A semi-supervised model with extreme gradient boosting has been utilized to predict the insulin features of the private dataset. SMOTE and ADASYN approaches have been employed to manage the class imbalance problem. The authors used machine learning classification methods, that is, decision tree, SVM, Random Forest, Logistic Regression, KNN, and various ensemble techniques, to determine which algorithm produces the best prediction results. After training on and testing all the classification models, the proposed system provided the best result in the XGBoost classifier with the ADASYN approach with 81% accuracy, 0.81 F1 coefficient and AUC of 0.84. Furthermore, the domain adaptation method has been implemented to demonstrate the versatility of the proposed system. The explainable AI approach with LIME and SHAP frameworks is implemented to understand how the model predicts the final results. Finally, a website framework and an Android smartphone application have been developed to input various features and predict diabetes instantaneously. The private dataset of female Bangladeshi patients and programming codes are available at the following link: https://github.com/tansin-nabil/Diabetes-Prediction-Using-Machine-Learning.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Healthcare Technology Letters
Healthcare Technology Letters Health Professions-Health Information Management
CiteScore
6.10
自引率
4.80%
发文量
12
审稿时长
22 weeks
期刊介绍: Healthcare Technology Letters aims to bring together an audience of biomedical and electrical engineers, physical and computer scientists, and mathematicians to enable the exchange of the latest ideas and advances through rapid online publication of original healthcare technology research. Major themes of the journal include (but are not limited to): Major technological/methodological areas: Biomedical signal processing Biomedical imaging and image processing Bioinstrumentation (sensors, wearable technologies, etc) Biomedical informatics Major application areas: Cardiovascular and respiratory systems engineering Neural engineering, neuromuscular systems Rehabilitation engineering Bio-robotics, surgical planning and biomechanics Therapeutic and diagnostic systems, devices and technologies Clinical engineering Healthcare information systems, telemedicine, mHealth.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信