Developing an Interpretable Machine Learning Model for Early Prediction of Cardiovascular Involvement in Systemic Lupus Erythematosus.

IF 4.2 2区 医学 Q2 IMMUNOLOGY
Journal of Inflammation Research Pub Date : 2025-07-01 eCollection Date: 2025-01-01 DOI:10.2147/JIR.S526608
Zixian Deng, Huadong Liu, Feng Chen, Qiyun Liu, Xiaoyu Wang, Caiping Wang, Chuangye Lyu, Jianghua Li, Tangzhiming Li
{"title":"Developing an Interpretable Machine Learning Model for Early Prediction of Cardiovascular Involvement in Systemic Lupus Erythematosus.","authors":"Zixian Deng, Huadong Liu, Feng Chen, Qiyun Liu, Xiaoyu Wang, Caiping Wang, Chuangye Lyu, Jianghua Li, Tangzhiming Li","doi":"10.2147/JIR.S526608","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Cardiovascular disease is a leading cause of death in systemic lupus erythematosus (SLE). Early prediction of cardiac involvement is critical for improving patient outcomes. This study aimed to identify key factors associated with cardiac involvement in SLE and to develop an interpretable machine learning (ML) model for risk prediction.</p><p><strong>Methods: </strong>We conducted a retrospective analysis of 1,023 SLE patients hospitalized in Shenzhen People's Hospital between January 2000 and December 2021, with a median age of 31 years at hospitalization (IQR: 25-39 years), 92.1% being female, and 18.77% developing cardiovascular involvement during a median follow-up of 3,737 days (IQR: 1,920-5,246). The most predictive features were selected through the intersection of three feature selection techniques: Random Forest, LASSO, and XGBoost. Models were trained on 70% of the dataset and tested on the remaining 30%. Among seven evaluated algorithms, the Gradient Boosting Machine (GBM) demonstrated the best performance on the test set. Model interpretability was assessed using the DALEX package, which generated feature importance plots and instance-level breakdown profiles to visualize decision-making logic.</p><p><strong>Results: </strong>Over a median follow-up of 3737 days, 192 (18.77%) patients developed cardiac involvement. Seven key predictors-arthritis, hypertension, HDL-C, LDL-C, total cholesterol, CRP, and ESR- were identified from 51 clinical and biological variables at admission. The Gradient Boosting Machine (GBM) model (AUC: 0.748, Accuracy: 0.779, Precision: 0.605, F1 score: 0.433, recall 0.338) performed the best of the seven models.</p><p><strong>Conclusion: </strong>This study is the first to develop an interpretable ML model to predict the risk of cardiac involvement in SLE. Notably, the GBM model showed optimal performance, and its interpretability allowed clinicians to visualize decision-making processes, facilitating early identification of high-risk patients.</p>","PeriodicalId":16107,"journal":{"name":"Journal of Inflammation Research","volume":"18 ","pages":"8629-8641"},"PeriodicalIF":4.2000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12230251/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Inflammation Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2147/JIR.S526608","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"IMMUNOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Cardiovascular disease is a leading cause of death in systemic lupus erythematosus (SLE). Early prediction of cardiac involvement is critical for improving patient outcomes. This study aimed to identify key factors associated with cardiac involvement in SLE and to develop an interpretable machine learning (ML) model for risk prediction.

Methods: We conducted a retrospective analysis of 1,023 SLE patients hospitalized in Shenzhen People's Hospital between January 2000 and December 2021, with a median age of 31 years at hospitalization (IQR: 25-39 years), 92.1% being female, and 18.77% developing cardiovascular involvement during a median follow-up of 3,737 days (IQR: 1,920-5,246). The most predictive features were selected through the intersection of three feature selection techniques: Random Forest, LASSO, and XGBoost. Models were trained on 70% of the dataset and tested on the remaining 30%. Among seven evaluated algorithms, the Gradient Boosting Machine (GBM) demonstrated the best performance on the test set. Model interpretability was assessed using the DALEX package, which generated feature importance plots and instance-level breakdown profiles to visualize decision-making logic.

Results: Over a median follow-up of 3737 days, 192 (18.77%) patients developed cardiac involvement. Seven key predictors-arthritis, hypertension, HDL-C, LDL-C, total cholesterol, CRP, and ESR- were identified from 51 clinical and biological variables at admission. The Gradient Boosting Machine (GBM) model (AUC: 0.748, Accuracy: 0.779, Precision: 0.605, F1 score: 0.433, recall 0.338) performed the best of the seven models.

Conclusion: This study is the first to develop an interpretable ML model to predict the risk of cardiac involvement in SLE. Notably, the GBM model showed optimal performance, and its interpretability allowed clinicians to visualize decision-making processes, facilitating early identification of high-risk patients.

开发一种可解释的机器学习模型,用于系统性红斑狼疮心血管疾病的早期预测。
背景:心血管疾病是系统性红斑狼疮(SLE)患者死亡的主要原因。早期预测心脏受累对改善患者预后至关重要。本研究旨在确定SLE中与心脏受累相关的关键因素,并开发可解释的机器学习(ML)模型用于风险预测。方法:我们对2000年1月至2021年12月在深圳人民医院住院的1023例SLE患者进行回顾性分析,住院时中位年龄为31岁(IQR: 25-39岁),92.1%为女性,18.77%在中位随访3,737天(IQR: 1,920-5,246)期间发生心血管疾病。通过三种特征选择技术:随机森林、LASSO和XGBoost的交叉选择,选择了最具预测性的特征。模型在70%的数据集上进行训练,并在剩下的30%上进行测试。在7种被评估的算法中,梯度增强机(GBM)在测试集上表现出最好的性能。模型的可解释性使用DALEX软件包进行评估,该软件包生成特征重要性图和实例级分解概况,以可视化决策逻辑。结果:在3737天的中位随访中,192例(18.77%)患者发生心脏受累。7个关键预测指标——关节炎、高血压、HDL-C、LDL-C、总胆固醇、CRP和ESR——在入院时从51个临床和生物学变量中确定。梯度增强机(Gradient Boosting Machine, GBM)模型(AUC: 0.748,准确率:0.779,精度:0.605,F1分数:0.433,召回率0.338)在7个模型中表现最好。结论:本研究首次建立了可解释的ML模型来预测SLE患者心脏受累的风险。值得注意的是,GBM模型表现出最佳性能,其可解释性使临床医生能够可视化决策过程,促进早期识别高危患者。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Inflammation Research
Journal of Inflammation Research Immunology and Microbiology-Immunology
CiteScore
6.10
自引率
2.20%
发文量
658
审稿时长
16 weeks
期刊介绍: An international, peer-reviewed, open access, online journal that welcomes laboratory and clinical findings on the molecular basis, cell biology and pharmacology of inflammation.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信