{"title":"Development and validation of an explainable machine learning prediction model of hemorrhagic transformation after intravenous thrombolysis in stroke.","authors":"Yanan Lin, Yan Li, Yayin Luo, Jie Han","doi":"10.3389/fneur.2024.1446250","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>To develop and validate an explainable machine learning (ML) model predicting the risk of hemorrhagic transformation (HT) after intravenous thrombolysis.</p><p><strong>Methods: </strong>We retrospectively enrolled patients who received intravenous tissue plasminogen activator (IV-tPA) thrombolysis within 4.5 h after symptom onset to form the original modeling cohort. HT was defined as any hemorrhage on head CT scan completed within 48 h after IV-tPA administration. We utilized the Random Forest (RF), Multilayer Perceptron (MLP), Adaptive Boosting (AdaBoost), and Gaussian Naive Bayes (GauNB) algorithms to develop ML-HT models. The models' predictive performance was evaluated using confusion matrix (including accuracy, precision, recall, and F1 score), and discriminative analysis (area under the receiver-operating-characteristic curve, ROC-AUC) in the original cohort, followed by validation in an independent external cohort. The models' explainability was assessed using SHapley Additive exPlanations (SHAP) global feature plot, SHAP Summary Plot, and Partial Dependence Plot.</p><p><strong>Results: </strong>A total of 1,007 patients were included in the original modeling cohort, with an HT incidence of 8.94%. The RF-based ML-HT model showed metrics of 0.874 (accuracy), 0.972 (precision), 0.890 (recall), 0.929 (F1 score); with ROC-AUC of 0.7847 in the original cohort and 0.7119 in the external validation cohort. The MLP model showed 0.878, 0.967, 0.989, 0.978, 0.7710, and 0.6768, respectively. The AdaBoost model showed 0.907, 0.967, 0.989, 0.978, 0.7798, and 0.6606, respectively. The GauNB model showed 0.848, 0.983, 0.598, 0.716, 0.6953, and 0.6289, respectively. The explainable analysis of the RF-based ML model indicated that the National Institute of Health Stroke Scale (NIHSS) score, age, platelet count, and atrial fibrillation were the primary determinants for HT following IV-tPA thrombolysis.</p><p><strong>Conclusion: </strong>The RF-based explainable ML model demonstrated promising predictive ability for estimating the risk of HT after IV-tPA thrombolysis and may have the potential to assist the clinical decision-making in emergency settings.</p>","PeriodicalId":12575,"journal":{"name":"Frontiers in Neurology","volume":"15 ","pages":"1446250"},"PeriodicalIF":2.7000,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11775651/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Neurology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3389/fneur.2024.1446250","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: To develop and validate an explainable machine learning (ML) model predicting the risk of hemorrhagic transformation (HT) after intravenous thrombolysis.
Methods: We retrospectively enrolled patients who received intravenous tissue plasminogen activator (IV-tPA) thrombolysis within 4.5 h after symptom onset to form the original modeling cohort. HT was defined as any hemorrhage on head CT scan completed within 48 h after IV-tPA administration. We utilized the Random Forest (RF), Multilayer Perceptron (MLP), Adaptive Boosting (AdaBoost), and Gaussian Naive Bayes (GauNB) algorithms to develop ML-HT models. The models' predictive performance was evaluated using confusion matrix (including accuracy, precision, recall, and F1 score), and discriminative analysis (area under the receiver-operating-characteristic curve, ROC-AUC) in the original cohort, followed by validation in an independent external cohort. The models' explainability was assessed using SHapley Additive exPlanations (SHAP) global feature plot, SHAP Summary Plot, and Partial Dependence Plot.
Results: A total of 1,007 patients were included in the original modeling cohort, with an HT incidence of 8.94%. The RF-based ML-HT model showed metrics of 0.874 (accuracy), 0.972 (precision), 0.890 (recall), 0.929 (F1 score); with ROC-AUC of 0.7847 in the original cohort and 0.7119 in the external validation cohort. The MLP model showed 0.878, 0.967, 0.989, 0.978, 0.7710, and 0.6768, respectively. The AdaBoost model showed 0.907, 0.967, 0.989, 0.978, 0.7798, and 0.6606, respectively. The GauNB model showed 0.848, 0.983, 0.598, 0.716, 0.6953, and 0.6289, respectively. The explainable analysis of the RF-based ML model indicated that the National Institute of Health Stroke Scale (NIHSS) score, age, platelet count, and atrial fibrillation were the primary determinants for HT following IV-tPA thrombolysis.
Conclusion: The RF-based explainable ML model demonstrated promising predictive ability for estimating the risk of HT after IV-tPA thrombolysis and may have the potential to assist the clinical decision-making in emergency settings.
期刊介绍:
The section Stroke aims to quickly and accurately publish important experimental, translational and clinical studies, and reviews that contribute to the knowledge of stroke, its causes, manifestations, diagnosis, and management.