Prediction model for postoperative pulmonary complications after thoracoscopic surgery with machine learning algorithms and SHapley Additive exPlanations.

IF 2.1 3区医学 Q3 RESPIRATORY SYSTEM

Journal of thoracic disease Pub Date : 2025-06-30 Epub Date: 2025-06-23 DOI:10.21037/jtd-24-1853

Shenyan Wang, Yongqi Lin, Hujuan Shi, Pengcheng Liang, Zihao Luo, Junfeng Kong, Junda Huang, Mingmei Cheng, Baoliang Zhang, Yanzhong Wang, Hongxing Kan, Lizhong Liang, Wanqing Xie

{"title":"Prediction model for postoperative pulmonary complications after thoracoscopic surgery with machine learning algorithms and SHapley Additive exPlanations.","authors":"Shenyan Wang, Yongqi Lin, Hujuan Shi, Pengcheng Liang, Zihao Luo, Junfeng Kong, Junda Huang, Mingmei Cheng, Baoliang Zhang, Yanzhong Wang, Hongxing Kan, Lizhong Liang, Wanqing Xie","doi":"10.21037/jtd-24-1853","DOIUrl":null,"url":null,"abstract":"Background: Postoperative pulmonary complications (PPCs) are common and have a negative impact on postoperative morbidity and mortality, with associated medical resource use and cost care plan. Management of preoperative and intraoperative risk factors has been shown to reduce the occurrence of PPCs. Therefore, this study aimed to develop a risk prediction model for PPCs based on explainable machine learning (ML) methods and evaluate its predictive performance in order to enhance the prevention and intervention for PPCs.Methods: In this study, the medical records of 1,629 patients who underwent thoracoscopic surgery were collected from two clinical groups at the Affiliated Hospital of Guangdong Medical University between August 2018 and October 2021. Five categories of data were used as predictors, including patient demographics, medical history and comorbidities, laboratory studies, intraoperative vital signs, and surgical procedure-related data. Seven ML methods, including random forest (RF), adaptive boosting (AdaBoost), extra trees (ET), extreme gradient boosting (XGBoost), gradient boosting decision tree (GBDT), and two ensemble learning methods, including voting classifier (Voting), and stacking-logistic regression (Stacking-LR), were used to predict the occurrence of PPCs in patients undergoing thoracoscopic surgery. The model performance was validated in internal, temporal, and external phases. Additionally, an explainable approach based on ML methods and the SHapely Additive exPlanation (SHAP) algorithm was used for calculating the PPCs risk and generating individual explanations of the model decisions.Results: In the model validation phase, the RF algorithm performed well in all types of validations compared with other ML algorithms. Internal validation from within-center dataset, area under the curve (AUC) =0.82 [95% confidence interval (CI): 0.80-0.84]; temporal validation from within-center dataset, AUC =0.73 (95% CI: 0.71-0.75); external validation from cross-center dataset, AUC =0.76 (95% CI: 0.75-0.77). The model-agnostic explanation was generated by the SHAP analysis that illustrated the significant clinical factors associated with the top 20 risks of PPCs.Conclusions: The risk prediction model for PPCs based on the explainable ML methods is valid and, therefore, can be implemented in clinical settings for identifying high-risk patients for PPCs, providing appropriate clinical advice regarding targeted interventions and improved monitoring to alleviate modifiable risk factors.","PeriodicalId":17542,"journal":{"name":"Journal of thoracic disease","volume":"17 6","pages":"3603-3618"},"PeriodicalIF":2.1000,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12268710/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of thoracic disease","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.21037/jtd-24-1853","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/6/23 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"RESPIRATORY SYSTEM","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Postoperative pulmonary complications (PPCs) are common and have a negative impact on postoperative morbidity and mortality, with associated medical resource use and cost care plan. Management of preoperative and intraoperative risk factors has been shown to reduce the occurrence of PPCs. Therefore, this study aimed to develop a risk prediction model for PPCs based on explainable machine learning (ML) methods and evaluate its predictive performance in order to enhance the prevention and intervention for PPCs.

Methods: In this study, the medical records of 1,629 patients who underwent thoracoscopic surgery were collected from two clinical groups at the Affiliated Hospital of Guangdong Medical University between August 2018 and October 2021. Five categories of data were used as predictors, including patient demographics, medical history and comorbidities, laboratory studies, intraoperative vital signs, and surgical procedure-related data. Seven ML methods, including random forest (RF), adaptive boosting (AdaBoost), extra trees (ET), extreme gradient boosting (XGBoost), gradient boosting decision tree (GBDT), and two ensemble learning methods, including voting classifier (Voting), and stacking-logistic regression (Stacking-LR), were used to predict the occurrence of PPCs in patients undergoing thoracoscopic surgery. The model performance was validated in internal, temporal, and external phases. Additionally, an explainable approach based on ML methods and the SHapely Additive exPlanation (SHAP) algorithm was used for calculating the PPCs risk and generating individual explanations of the model decisions.

Results: In the model validation phase, the RF algorithm performed well in all types of validations compared with other ML algorithms. Internal validation from within-center dataset, area under the curve (AUC) =0.82 [95% confidence interval (CI): 0.80-0.84]; temporal validation from within-center dataset, AUC =0.73 (95% CI: 0.71-0.75); external validation from cross-center dataset, AUC =0.76 (95% CI: 0.75-0.77). The model-agnostic explanation was generated by the SHAP analysis that illustrated the significant clinical factors associated with the top 20 risks of PPCs.

Conclusions: The risk prediction model for PPCs based on the explainable ML methods is valid and, therefore, can be implemented in clinical settings for identifying high-risk patients for PPCs, providing appropriate clinical advice regarding targeted interventions and improved monitoring to alleviate modifiable risk factors.

查看原文本刊更多论文

基于机器学习算法和SHapley加法解释的胸腔镜术后肺部并发症预测模型。

背景：术后肺部并发症（PPCs）很常见，对术后发病率和死亡率有负面影响，与医疗资源使用和成本护理计划有关。术前和术中危险因素的管理已被证明可以减少PPCs的发生。因此，本研究旨在建立基于可解释性机器学习（ML）方法的PPCs风险预测模型，并评估其预测性能，以加强PPCs的预防和干预。方法：本研究收集2018年8月至2021年10月广东医科大学附属医院两个临床组的1629例胸腔镜手术患者的病历。五类数据被用作预测因素，包括患者人口统计学、病史和合并症、实验室研究、术中生命体征和手术相关数据。采用随机森林（RF）、自适应增强（AdaBoost）、额外树（ET）、极端梯度增强（XGBoost）、梯度增强决策树（GBDT）等7种ML方法，以及投票分类器（voting）、堆叠-逻辑回归（stacking-logistic regression）等2种集成学习方法预测胸腔镜手术患者PPCs的发生。模型的性能在内部、时间和外部三个阶段进行了验证。此外，基于ML方法和SHapely加性解释（SHAP）算法的可解释方法用于计算PPCs风险并生成模型决策的个体解释。结果：在模型验证阶段，与其他ML算法相比，RF算法在所有类型的验证中都表现良好。来自中心内数据集的内部验证，曲线下面积(AUC) =0.82[95%置信区间（CI）： 0.80-0.84]；中心内数据集的时间验证，AUC =0.73 (95% CI: 0.71-0.75)；来自跨中心数据集的外部验证，AUC =0.76 （95% CI: 0.75-0.77）。模型不可知的解释是由SHAP分析产生的，该分析说明了与前20种PPCs风险相关的重要临床因素。结论：基于可解释性ML方法建立的PPCs风险预测模型是有效的，可在临床环境中实施，用于识别PPCs高危患者，提供针对性干预和改善监测的临床建议，以减轻可改变的危险因素。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of thoracic disease RESPIRATORY SYSTEM-

CiteScore

4.60

自引率

4.00%

发文量

254

期刊介绍： The Journal of Thoracic Disease (JTD, J Thorac Dis, pISSN: 2072-1439; eISSN: 2077-6624) was founded in Dec 2009, and indexed in PubMed in Dec 2011 and Science Citation Index SCI in Feb 2013. It is published quarterly (Dec 2009- Dec 2011), bimonthly (Jan 2012 - Dec 2013), monthly (Jan. 2014-) and openly distributed worldwide. JTD received its impact factor of 2.365 for the year 2016. JTD publishes manuscripts that describe new findings and provide current, practical information on the diagnosis and treatment of conditions related to thoracic disease. All the submission and reviewing are conducted electronically so that rapid review is assured.