Understanding Arteriosclerotic Heart Disease Patients Using Electronic Health Records: A Machine Learning and Shapley Additive exPlanations Approach.

IF 2.3 Q3 MEDICAL INFORMATICS
Eka Miranda, Suko Adiarto, Faqir M Bhatti, Alfi Yusrotis Zakiyyah, Mediana Aryuni, Charles Bernando
{"title":"Understanding Arteriosclerotic Heart Disease Patients Using Electronic Health Records: A Machine Learning and Shapley Additive exPlanations Approach.","authors":"Eka Miranda,&nbsp;Suko Adiarto,&nbsp;Faqir M Bhatti,&nbsp;Alfi Yusrotis Zakiyyah,&nbsp;Mediana Aryuni,&nbsp;Charles Bernando","doi":"10.4258/hir.2023.29.3.228","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>The number of deaths from cardiovascular disease is projected to reach 23.3 million by 2030. As a contribution to preventing this phenomenon, this paper proposed a machine learning (ML) model to predict patients with arteriosclerotic heart disease (AHD). We also interpreted the prediction model results based on the ML approach and deployed modelagnostic ML methods to identify informative features and their interpretations.</p><p><strong>Methods: </strong>We used a hematology Electronic Health Record (EHR) with information on erythrocytes, hematocrit, hemoglobin, mean corpuscular hemoglobin, mean corpuscular hemoglobin concentration, leukocytes, thrombocytes, age, and sex. To detect and predict AHD, we explored random forest (RF), XGBoost, and AdaBoost models. We examined the prediction model results based on the confusion matrix and accuracy measures. We used the Shapley Additive exPlanations (SHAP) framework to interpret the ML model and quantify the contribution of features to predictions.</p><p><strong>Results: </strong>Our study included data from 6,837 patients, with 4,702 records from patients diagnosed with AHD and 2,135 records from patients without an AHD diagnosis. AdaBoost outperformed RF and XGBoost, achieving an accuracy of 0.78, precision of 0.82, F1-score of 0.85, and recall of 0.88. According to the SHAP summary bar plot method, hemoglobin was the most important attribute for detecting and predicting AHD patients. The SHAP local interpretability bar plot revealed that hemoglobin and mean corpuscular hemoglobin concentration had positive impacts on AHD prediction based on a single observation.</p><p><strong>Conclusions: </strong>ML models based on real clinical data can be used to predict AHD.</p>","PeriodicalId":12947,"journal":{"name":"Healthcare Informatics Research","volume":null,"pages":null},"PeriodicalIF":2.3000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/e7/3a/hir-2023-29-3-228.PMC10440196.pdf","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Healthcare Informatics Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4258/hir.2023.29.3.228","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 1

Abstract

Objectives: The number of deaths from cardiovascular disease is projected to reach 23.3 million by 2030. As a contribution to preventing this phenomenon, this paper proposed a machine learning (ML) model to predict patients with arteriosclerotic heart disease (AHD). We also interpreted the prediction model results based on the ML approach and deployed modelagnostic ML methods to identify informative features and their interpretations.

Methods: We used a hematology Electronic Health Record (EHR) with information on erythrocytes, hematocrit, hemoglobin, mean corpuscular hemoglobin, mean corpuscular hemoglobin concentration, leukocytes, thrombocytes, age, and sex. To detect and predict AHD, we explored random forest (RF), XGBoost, and AdaBoost models. We examined the prediction model results based on the confusion matrix and accuracy measures. We used the Shapley Additive exPlanations (SHAP) framework to interpret the ML model and quantify the contribution of features to predictions.

Results: Our study included data from 6,837 patients, with 4,702 records from patients diagnosed with AHD and 2,135 records from patients without an AHD diagnosis. AdaBoost outperformed RF and XGBoost, achieving an accuracy of 0.78, precision of 0.82, F1-score of 0.85, and recall of 0.88. According to the SHAP summary bar plot method, hemoglobin was the most important attribute for detecting and predicting AHD patients. The SHAP local interpretability bar plot revealed that hemoglobin and mean corpuscular hemoglobin concentration had positive impacts on AHD prediction based on a single observation.

Conclusions: ML models based on real clinical data can be used to predict AHD.

Abstract Image

Abstract Image

Abstract Image

使用电子健康记录了解动脉硬化性心脏病患者:机器学习和Shapley加性解释方法。
目标:到2030年,心血管疾病死亡人数预计将达到2 330万。为了预防这种现象,本文提出了一种机器学习(ML)模型来预测动脉硬化性心脏病(AHD)患者。我们还基于机器学习方法解释了预测模型结果,并部署了与模型无关的机器学习方法来识别信息特征及其解释。方法:我们使用血液学电子健康记录(EHR),其中包含红细胞、红细胞压积、血红蛋白、平均红细胞血红蛋白、平均红细胞血红蛋白浓度、白细胞、血小板、年龄和性别等信息。为了检测和预测AHD,我们探索了随机森林(RF)、XGBoost和AdaBoost模型。我们检验了基于混淆矩阵和精度度量的预测模型结果。我们使用Shapley加性解释(SHAP)框架来解释ML模型,并量化特征对预测的贡献。结果:我们的研究纳入了6837例患者的数据,其中4702例来自诊断为AHD的患者,2135例来自未诊断为AHD的患者。AdaBoost优于RF和XGBoost,准确度为0.78,精密度为0.82,f1得分为0.85,召回率为0.88。根据SHAP汇总条形图方法,血红蛋白是检测和预测AHD患者最重要的属性。SHAP局部可解释性条形图显示,血红蛋白和平均红细胞血红蛋白浓度对单次观察的AHD预测有积极影响。结论:基于真实临床数据的ML模型可用于预测AHD。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Healthcare Informatics Research
Healthcare Informatics Research MEDICAL INFORMATICS-
CiteScore
4.90
自引率
6.90%
发文量
44
文献相关原料
公司名称 产品信息 采购帮参考价格
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信