Enhancing automatic early arteriosclerosis prediction: an explainable machine learning evidence

Clinical eHealth Pub Date : 2024-12-01 DOI:10.1016/j.ceh.2024.12.003

Eka Miranda , Suko Adiarto

{"title":"Enhancing automatic early arteriosclerosis prediction: an explainable machine learning evidence","authors":"Eka Miranda , Suko Adiarto","doi":"10.1016/j.ceh.2024.12.003","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>This paper proposed a machine learning (ML) model to early predict patients with arteriosclerotic heart disease (AHD). We also used model-agnostic ML approaches to find and analyze informative aspects in the prediction model outcomes.</div></div><div><h3>Methods</h3><div>We employed an Electronic Health Record (EHR) for hematology that contained data on erythrocytes, hematocrit, hemoglobin, mean corpuscular hemoglobin, mean corpuscular hemoglobin concentration, leukocytes, thrombocytes, age, and sex. Our investigation included Decision Tree (DT), Random Forest (RF), Logistic Regression (LR), Bagging Decision Tree (BDT), and Bagging Logistic Regression (BLR) for ML-based AHD detection. To handle imbalanced data and increase classifier accuracy, we used bagging and the Synthetic Minority Oversampling Technique (SMOTE). Following that, we used the Shapley Additive exPlanations (SHAP) framework to explain the ML model and quantify the feature contribution to predictions.</div></div><div><h3>Results</h3><div>SMOTE-balanced data with RF outperformed on practically all performance measures, including accuracy, precision, recall, f1-score, and ROCAUC, by 82.12 %, 81.31 %, 83.37 %, 82.57 %, and 89 %, respectively. According to the SHAP summary bar plot method for global feature importance, hemoglobin was the most important attribute for detecting and predicting AHD patients. Then, local interpretability in the form of a force plot illustrated the consequences of a single observation’s prediction as well as the magnitude of the SHAP value for each feature. Our findings demonstrated that hemoglobin, erythrocytes, hematocrit, hermch, khermchc, leukocytes, thrombocytes, and age all contributed positively to the prediction of class 1 (AHD patients), however gender had a negative impact on the prediction on a case-by-case basis. For class 0 (patients with no AHD), thrombocytes, hematocrit, and gender contributed positively, but leukocytes, erythrocytes, hemoglobin, and khermchc contributed adversely.</div></div><div><h3>Conclusion</h3><div>Explainable ML paved the way for early AHD prediction since it examined black-box ML models to determine how each feature contributed to the final prediction.</div></div>","PeriodicalId":100268,"journal":{"name":"Clinical eHealth","volume":"7 ","pages":"Pages 153-163"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical eHealth","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2588914124000169","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Objective

This paper proposed a machine learning (ML) model to early predict patients with arteriosclerotic heart disease (AHD). We also used model-agnostic ML approaches to find and analyze informative aspects in the prediction model outcomes.

Methods

We employed an Electronic Health Record (EHR) for hematology that contained data on erythrocytes, hematocrit, hemoglobin, mean corpuscular hemoglobin, mean corpuscular hemoglobin concentration, leukocytes, thrombocytes, age, and sex. Our investigation included Decision Tree (DT), Random Forest (RF), Logistic Regression (LR), Bagging Decision Tree (BDT), and Bagging Logistic Regression (BLR) for ML-based AHD detection. To handle imbalanced data and increase classifier accuracy, we used bagging and the Synthetic Minority Oversampling Technique (SMOTE). Following that, we used the Shapley Additive exPlanations (SHAP) framework to explain the ML model and quantify the feature contribution to predictions.

Results

SMOTE-balanced data with RF outperformed on practically all performance measures, including accuracy, precision, recall, f1-score, and ROCAUC, by 82.12 %, 81.31 %, 83.37 %, 82.57 %, and 89 %, respectively. According to the SHAP summary bar plot method for global feature importance, hemoglobin was the most important attribute for detecting and predicting AHD patients. Then, local interpretability in the form of a force plot illustrated the consequences of a single observation’s prediction as well as the magnitude of the SHAP value for each feature. Our findings demonstrated that hemoglobin, erythrocytes, hematocrit, hermch, khermchc, leukocytes, thrombocytes, and age all contributed positively to the prediction of class 1 (AHD patients), however gender had a negative impact on the prediction on a case-by-case basis. For class 0 (patients with no AHD), thrombocytes, hematocrit, and gender contributed positively, but leukocytes, erythrocytes, hemoglobin, and khermchc contributed adversely.

Conclusion

Explainable ML paved the way for early AHD prediction since it examined black-box ML models to determine how each feature contributed to the final prediction.

查看原文本刊更多论文

增强自动早期动脉硬化预测：一个可解释的机器学习证据

目的建立动脉硬化性心脏病（AHD）早期预测的机器学习（ML）模型。我们还使用与模型无关的ML方法来查找和分析预测模型结果中的信息方面。方法采用血液学电子健康记录（EHR），包括红细胞、红细胞比容、血红蛋白、平均红细胞血红蛋白、平均红细胞血红蛋白浓度、白细胞、血小板、年龄和性别等数据。我们的研究包括决策树（DT）、随机森林（RF）、逻辑回归（LR）、Bagging决策树（BDT）和Bagging Logistic回归（BLR）用于基于ml的AHD检测。为了处理不平衡数据并提高分类器的准确性，我们使用了装袋和合成少数过采样技术（SMOTE）。接下来，我们使用Shapley加性解释（SHAP）框架来解释ML模型，并量化特征对预测的贡献。结果使用RF的smot -balanced数据在准确率、精密度、召回率、f1-score和ROCAUC等几乎所有性能指标上分别高出82.12%、81.31%、83.37%、82.57%和89%。根据SHAP总体特征重要性汇总条形图方法，血红蛋白是检测和预测AHD患者最重要的属性。然后，以力图形式的局部可解释性说明了单个观测预测的结果以及每个特征的SHAP值的大小。我们的研究结果表明，血红蛋白、红细胞、红细胞压积、hermch、khermchc、白细胞、血小板和年龄都对1级（AHD患者）的预测有积极的影响，而性别对个案预测有负面影响。对于0级（无AHD患者），血小板、红细胞压积和性别有积极作用，但白细胞、红细胞、血红蛋白和血红蛋白有不利作用。可解释ML为早期AHD预测铺平了道路，因为它检查了黑箱ML模型，以确定每个特征如何对最终预测做出贡献。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Clinical eHealth

CiteScore

8.10

自引率

0.00%

发文量