{"title":"Developing an Interpretable Machine Learning Model for Early Prediction of Cardiovascular Involvement in Systemic Lupus Erythematosus.","authors":"Zixian Deng, Huadong Liu, Feng Chen, Qiyun Liu, Xiaoyu Wang, Caiping Wang, Chuangye Lyu, Jianghua Li, Tangzhiming Li","doi":"10.2147/JIR.S526608","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Cardiovascular disease is a leading cause of death in systemic lupus erythematosus (SLE). Early prediction of cardiac involvement is critical for improving patient outcomes. This study aimed to identify key factors associated with cardiac involvement in SLE and to develop an interpretable machine learning (ML) model for risk prediction.</p><p><strong>Methods: </strong>We conducted a retrospective analysis of 1,023 SLE patients hospitalized in Shenzhen People's Hospital between January 2000 and December 2021, with a median age of 31 years at hospitalization (IQR: 25-39 years), 92.1% being female, and 18.77% developing cardiovascular involvement during a median follow-up of 3,737 days (IQR: 1,920-5,246). The most predictive features were selected through the intersection of three feature selection techniques: Random Forest, LASSO, and XGBoost. Models were trained on 70% of the dataset and tested on the remaining 30%. Among seven evaluated algorithms, the Gradient Boosting Machine (GBM) demonstrated the best performance on the test set. Model interpretability was assessed using the DALEX package, which generated feature importance plots and instance-level breakdown profiles to visualize decision-making logic.</p><p><strong>Results: </strong>Over a median follow-up of 3737 days, 192 (18.77%) patients developed cardiac involvement. Seven key predictors-arthritis, hypertension, HDL-C, LDL-C, total cholesterol, CRP, and ESR- were identified from 51 clinical and biological variables at admission. The Gradient Boosting Machine (GBM) model (AUC: 0.748, Accuracy: 0.779, Precision: 0.605, F1 score: 0.433, recall 0.338) performed the best of the seven models.</p><p><strong>Conclusion: </strong>This study is the first to develop an interpretable ML model to predict the risk of cardiac involvement in SLE. Notably, the GBM model showed optimal performance, and its interpretability allowed clinicians to visualize decision-making processes, facilitating early identification of high-risk patients.</p>","PeriodicalId":16107,"journal":{"name":"Journal of Inflammation Research","volume":"18 ","pages":"8629-8641"},"PeriodicalIF":4.2000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12230251/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Inflammation Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2147/JIR.S526608","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"IMMUNOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Cardiovascular disease is a leading cause of death in systemic lupus erythematosus (SLE). Early prediction of cardiac involvement is critical for improving patient outcomes. This study aimed to identify key factors associated with cardiac involvement in SLE and to develop an interpretable machine learning (ML) model for risk prediction.
Methods: We conducted a retrospective analysis of 1,023 SLE patients hospitalized in Shenzhen People's Hospital between January 2000 and December 2021, with a median age of 31 years at hospitalization (IQR: 25-39 years), 92.1% being female, and 18.77% developing cardiovascular involvement during a median follow-up of 3,737 days (IQR: 1,920-5,246). The most predictive features were selected through the intersection of three feature selection techniques: Random Forest, LASSO, and XGBoost. Models were trained on 70% of the dataset and tested on the remaining 30%. Among seven evaluated algorithms, the Gradient Boosting Machine (GBM) demonstrated the best performance on the test set. Model interpretability was assessed using the DALEX package, which generated feature importance plots and instance-level breakdown profiles to visualize decision-making logic.
Results: Over a median follow-up of 3737 days, 192 (18.77%) patients developed cardiac involvement. Seven key predictors-arthritis, hypertension, HDL-C, LDL-C, total cholesterol, CRP, and ESR- were identified from 51 clinical and biological variables at admission. The Gradient Boosting Machine (GBM) model (AUC: 0.748, Accuracy: 0.779, Precision: 0.605, F1 score: 0.433, recall 0.338) performed the best of the seven models.
Conclusion: This study is the first to develop an interpretable ML model to predict the risk of cardiac involvement in SLE. Notably, the GBM model showed optimal performance, and its interpretability allowed clinicians to visualize decision-making processes, facilitating early identification of high-risk patients.
期刊介绍:
An international, peer-reviewed, open access, online journal that welcomes laboratory and clinical findings on the molecular basis, cell biology and pharmacology of inflammation.