Prediction model for postoperative pulmonary complications after thoracoscopic surgery with machine learning algorithms and SHapley Additive exPlanations.
{"title":"Prediction model for postoperative pulmonary complications after thoracoscopic surgery with machine learning algorithms and SHapley Additive exPlanations.","authors":"Shenyan Wang, Yongqi Lin, Hujuan Shi, Pengcheng Liang, Zihao Luo, Junfeng Kong, Junda Huang, Mingmei Cheng, Baoliang Zhang, Yanzhong Wang, Hongxing Kan, Lizhong Liang, Wanqing Xie","doi":"10.21037/jtd-24-1853","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Postoperative pulmonary complications (PPCs) are common and have a negative impact on postoperative morbidity and mortality, with associated medical resource use and cost care plan. Management of preoperative and intraoperative risk factors has been shown to reduce the occurrence of PPCs. Therefore, this study aimed to develop a risk prediction model for PPCs based on explainable machine learning (ML) methods and evaluate its predictive performance in order to enhance the prevention and intervention for PPCs.</p><p><strong>Methods: </strong>In this study, the medical records of 1,629 patients who underwent thoracoscopic surgery were collected from two clinical groups at the Affiliated Hospital of Guangdong Medical University between August 2018 and October 2021. Five categories of data were used as predictors, including patient demographics, medical history and comorbidities, laboratory studies, intraoperative vital signs, and surgical procedure-related data. Seven ML methods, including random forest (RF), adaptive boosting (AdaBoost), extra trees (ET), extreme gradient boosting (XGBoost), gradient boosting decision tree (GBDT), and two ensemble learning methods, including voting classifier (Voting), and stacking-logistic regression (Stacking-LR), were used to predict the occurrence of PPCs in patients undergoing thoracoscopic surgery. The model performance was validated in internal, temporal, and external phases. Additionally, an explainable approach based on ML methods and the SHapely Additive exPlanation (SHAP) algorithm was used for calculating the PPCs risk and generating individual explanations of the model decisions.</p><p><strong>Results: </strong>In the model validation phase, the RF algorithm performed well in all types of validations compared with other ML algorithms. Internal validation from within-center dataset, area under the curve (AUC) =0.82 [95% confidence interval (CI): 0.80-0.84]; temporal validation from within-center dataset, AUC =0.73 (95% CI: 0.71-0.75); external validation from cross-center dataset, AUC =0.76 (95% CI: 0.75-0.77). The model-agnostic explanation was generated by the SHAP analysis that illustrated the significant clinical factors associated with the top 20 risks of PPCs.</p><p><strong>Conclusions: </strong>The risk prediction model for PPCs based on the explainable ML methods is valid and, therefore, can be implemented in clinical settings for identifying high-risk patients for PPCs, providing appropriate clinical advice regarding targeted interventions and improved monitoring to alleviate modifiable risk factors.</p>","PeriodicalId":17542,"journal":{"name":"Journal of thoracic disease","volume":"17 6","pages":"3603-3618"},"PeriodicalIF":2.1000,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12268710/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of thoracic disease","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.21037/jtd-24-1853","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/6/23 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"RESPIRATORY SYSTEM","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Postoperative pulmonary complications (PPCs) are common and have a negative impact on postoperative morbidity and mortality, with associated medical resource use and cost care plan. Management of preoperative and intraoperative risk factors has been shown to reduce the occurrence of PPCs. Therefore, this study aimed to develop a risk prediction model for PPCs based on explainable machine learning (ML) methods and evaluate its predictive performance in order to enhance the prevention and intervention for PPCs.
Methods: In this study, the medical records of 1,629 patients who underwent thoracoscopic surgery were collected from two clinical groups at the Affiliated Hospital of Guangdong Medical University between August 2018 and October 2021. Five categories of data were used as predictors, including patient demographics, medical history and comorbidities, laboratory studies, intraoperative vital signs, and surgical procedure-related data. Seven ML methods, including random forest (RF), adaptive boosting (AdaBoost), extra trees (ET), extreme gradient boosting (XGBoost), gradient boosting decision tree (GBDT), and two ensemble learning methods, including voting classifier (Voting), and stacking-logistic regression (Stacking-LR), were used to predict the occurrence of PPCs in patients undergoing thoracoscopic surgery. The model performance was validated in internal, temporal, and external phases. Additionally, an explainable approach based on ML methods and the SHapely Additive exPlanation (SHAP) algorithm was used for calculating the PPCs risk and generating individual explanations of the model decisions.
Results: In the model validation phase, the RF algorithm performed well in all types of validations compared with other ML algorithms. Internal validation from within-center dataset, area under the curve (AUC) =0.82 [95% confidence interval (CI): 0.80-0.84]; temporal validation from within-center dataset, AUC =0.73 (95% CI: 0.71-0.75); external validation from cross-center dataset, AUC =0.76 (95% CI: 0.75-0.77). The model-agnostic explanation was generated by the SHAP analysis that illustrated the significant clinical factors associated with the top 20 risks of PPCs.
Conclusions: The risk prediction model for PPCs based on the explainable ML methods is valid and, therefore, can be implemented in clinical settings for identifying high-risk patients for PPCs, providing appropriate clinical advice regarding targeted interventions and improved monitoring to alleviate modifiable risk factors.
期刊介绍:
The Journal of Thoracic Disease (JTD, J Thorac Dis, pISSN: 2072-1439; eISSN: 2077-6624) was founded in Dec 2009, and indexed in PubMed in Dec 2011 and Science Citation Index SCI in Feb 2013. It is published quarterly (Dec 2009- Dec 2011), bimonthly (Jan 2012 - Dec 2013), monthly (Jan. 2014-) and openly distributed worldwide. JTD received its impact factor of 2.365 for the year 2016. JTD publishes manuscripts that describe new findings and provide current, practical information on the diagnosis and treatment of conditions related to thoracic disease. All the submission and reviewing are conducted electronically so that rapid review is assured.