使用电子健康记录了解动脉硬化性心脏病患者:机器学习和Shapley加性解释方法。

IF 2.1 Q3 MEDICAL INFORMATICS

Healthcare Informatics Research Pub Date : 2023-07-01 DOI:10.4258/hir.2023.29.3.228

Eka Miranda, Suko Adiarto, Faqir M Bhatti, Alfi Yusrotis Zakiyyah, Mediana Aryuni, Charles Bernando

{"title":"使用电子健康记录了解动脉硬化性心脏病患者:机器学习和Shapley加性解释方法。","authors":"Eka Miranda, Suko Adiarto, Faqir M Bhatti, Alfi Yusrotis Zakiyyah, Mediana Aryuni, Charles Bernando","doi":"10.4258/hir.2023.29.3.228","DOIUrl":null,"url":null,"abstract":"Objectives: The number of deaths from cardiovascular disease is projected to reach 23.3 million by 2030. As a contribution to preventing this phenomenon, this paper proposed a machine learning (ML) model to predict patients with arteriosclerotic heart disease (AHD). We also interpreted the prediction model results based on the ML approach and deployed modelagnostic ML methods to identify informative features and their interpretations.Methods: We used a hematology Electronic Health Record (EHR) with information on erythrocytes, hematocrit, hemoglobin, mean corpuscular hemoglobin, mean corpuscular hemoglobin concentration, leukocytes, thrombocytes, age, and sex. To detect and predict AHD, we explored random forest (RF), XGBoost, and AdaBoost models. We examined the prediction model results based on the confusion matrix and accuracy measures. We used the Shapley Additive exPlanations (SHAP) framework to interpret the ML model and quantify the contribution of features to predictions.Results: Our study included data from 6,837 patients, with 4,702 records from patients diagnosed with AHD and 2,135 records from patients without an AHD diagnosis. AdaBoost outperformed RF and XGBoost, achieving an accuracy of 0.78, precision of 0.82, F1-score of 0.85, and recall of 0.88. According to the SHAP summary bar plot method, hemoglobin was the most important attribute for detecting and predicting AHD patients. The SHAP local interpretability bar plot revealed that hemoglobin and mean corpuscular hemoglobin concentration had positive impacts on AHD prediction based on a single observation.Conclusions: ML models based on real clinical data can be used to predict AHD.","PeriodicalId":12947,"journal":{"name":"Healthcare Informatics Research","volume":"29 3","pages":"228-238"},"PeriodicalIF":2.1000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/e7/3a/hir-2023-29-3-228.PMC10440196.pdf","citationCount":"1","resultStr":"{\"title\":\"Understanding Arteriosclerotic Heart Disease Patients Using Electronic Health Records: A Machine Learning and Shapley Additive exPlanations Approach.\",\"authors\":\"Eka Miranda, Suko Adiarto, Faqir M Bhatti, Alfi Yusrotis Zakiyyah, Mediana Aryuni, Charles Bernando\",\"doi\":\"10.4258/hir.2023.29.3.228\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objectives: The number of deaths from cardiovascular disease is projected to reach 23.3 million by 2030. As a contribution to preventing this phenomenon, this paper proposed a machine learning (ML) model to predict patients with arteriosclerotic heart disease (AHD). We also interpreted the prediction model results based on the ML approach and deployed modelagnostic ML methods to identify informative features and their interpretations.Methods: We used a hematology Electronic Health Record (EHR) with information on erythrocytes, hematocrit, hemoglobin, mean corpuscular hemoglobin, mean corpuscular hemoglobin concentration, leukocytes, thrombocytes, age, and sex. To detect and predict AHD, we explored random forest (RF), XGBoost, and AdaBoost models. We examined the prediction model results based on the confusion matrix and accuracy measures. We used the Shapley Additive exPlanations (SHAP) framework to interpret the ML model and quantify the contribution of features to predictions.Results: Our study included data from 6,837 patients, with 4,702 records from patients diagnosed with AHD and 2,135 records from patients without an AHD diagnosis. AdaBoost outperformed RF and XGBoost, achieving an accuracy of 0.78, precision of 0.82, F1-score of 0.85, and recall of 0.88. According to the SHAP summary bar plot method, hemoglobin was the most important attribute for detecting and predicting AHD patients. The SHAP local interpretability bar plot revealed that hemoglobin and mean corpuscular hemoglobin concentration had positive impacts on AHD prediction based on a single observation.Conclusions: ML models based on real clinical data can be used to predict AHD.\",\"PeriodicalId\":12947,\"journal\":{\"name\":\"Healthcare Informatics Research\",\"volume\":\"29 3\",\"pages\":\"228-238\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2023-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/e7/3a/hir-2023-29-3-228.PMC10440196.pdf\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Healthcare Informatics Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4258/hir.2023.29.3.228\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"MEDICAL INFORMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Healthcare Informatics Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4258/hir.2023.29.3.228","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 1

摘要

目标:到2030年，心血管疾病死亡人数预计将达到2 330万。为了预防这种现象，本文提出了一种机器学习(ML)模型来预测动脉硬化性心脏病(AHD)患者。我们还基于机器学习方法解释了预测模型结果，并部署了与模型无关的机器学习方法来识别信息特征及其解释。方法:我们使用血液学电子健康记录(EHR)，其中包含红细胞、红细胞压积、血红蛋白、平均红细胞血红蛋白、平均红细胞血红蛋白浓度、白细胞、血小板、年龄和性别等信息。为了检测和预测AHD，我们探索了随机森林(RF)、XGBoost和AdaBoost模型。我们检验了基于混淆矩阵和精度度量的预测模型结果。我们使用Shapley加性解释(SHAP)框架来解释ML模型，并量化特征对预测的贡献。结果:我们的研究纳入了6837例患者的数据，其中4702例来自诊断为AHD的患者，2135例来自未诊断为AHD的患者。AdaBoost优于RF和XGBoost，准确度为0.78，精密度为0.82,f1得分为0.85，召回率为0.88。根据SHAP汇总条形图方法，血红蛋白是检测和预测AHD患者最重要的属性。SHAP局部可解释性条形图显示，血红蛋白和平均红细胞血红蛋白浓度对单次观察的AHD预测有积极影响。结论:基于真实临床数据的ML模型可用于预测AHD。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Understanding Arteriosclerotic Heart Disease Patients Using Electronic Health Records: A Machine Learning and Shapley Additive exPlanations Approach.

查看原文本刊更多论文

Understanding Arteriosclerotic Heart Disease Patients Using Electronic Health Records: A Machine Learning and Shapley Additive exPlanations Approach.

Objectives: The number of deaths from cardiovascular disease is projected to reach 23.3 million by 2030. As a contribution to preventing this phenomenon, this paper proposed a machine learning (ML) model to predict patients with arteriosclerotic heart disease (AHD). We also interpreted the prediction model results based on the ML approach and deployed modelagnostic ML methods to identify informative features and their interpretations.

Methods: We used a hematology Electronic Health Record (EHR) with information on erythrocytes, hematocrit, hemoglobin, mean corpuscular hemoglobin, mean corpuscular hemoglobin concentration, leukocytes, thrombocytes, age, and sex. To detect and predict AHD, we explored random forest (RF), XGBoost, and AdaBoost models. We examined the prediction model results based on the confusion matrix and accuracy measures. We used the Shapley Additive exPlanations (SHAP) framework to interpret the ML model and quantify the contribution of features to predictions.

Results: Our study included data from 6,837 patients, with 4,702 records from patients diagnosed with AHD and 2,135 records from patients without an AHD diagnosis. AdaBoost outperformed RF and XGBoost, achieving an accuracy of 0.78, precision of 0.82, F1-score of 0.85, and recall of 0.88. According to the SHAP summary bar plot method, hemoglobin was the most important attribute for detecting and predicting AHD patients. The SHAP local interpretability bar plot revealed that hemoglobin and mean corpuscular hemoglobin concentration had positive impacts on AHD prediction based on a single observation.

Conclusions: ML models based on real clinical data can be used to predict AHD.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Healthcare Informatics Research MEDICAL INFORMATICS-

CiteScore

4.90

自引率

6.90%

发文量