90-Day mortality prediction in elective visceral surgery using machine learning: a retrospective multicenter development, validation and comparison study.
Christoph Riepe, Robin van de Water, Axel Winter, Bjarne Pfitzner, Lara Faraj, Robert Ahlborn, Maximilian Schulze, Daniela Zuluaga, Christian Schineis, Katharina Beyer, Johann Pratschke, Bert Arnrich, Igor M Sauer, Max M Maurer
{"title":"90-Day mortality prediction in elective visceral surgery using machine learning: a retrospective multicenter development, validation and comparison study.","authors":"Christoph Riepe, Robin van de Water, Axel Winter, Bjarne Pfitzner, Lara Faraj, Robert Ahlborn, Maximilian Schulze, Daniela Zuluaga, Christian Schineis, Katharina Beyer, Johann Pratschke, Bert Arnrich, Igor M Sauer, Max M Maurer","doi":"10.1097/JS9.0000000000002372","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Machine Learning (ML) is increasingly being adopted in biomedical research, however, its potential for outcome prediction in visceral surgery remains uncertain. This study compares the potential of ML methods for preoperative 90-day mortality (90DM) prediction of an aggregated multi-organ approach to conventional scoring systems and individual organ models.</p><p><strong>Methods: </strong>This retrospective cohort study enrolled patients undergoing major elective visceral surgery between 2014 and 2022 across two tertiary centers. Multiple ML models for preoperative 90DM prediction were trained, externally validated and benchmarked against the American Society of Anesthesiologists (ASA) score and revised Charlson Comorbidity Index (rCCI). Areas under the receiver operating characteristic (AUROC) and precision recall curves (AUPRC) including standard deviations were calculated. Additionally, individual models for esophageal, gastric, intestinal, liver, and pancreatic surgery were developed and compared to an aggregated approach.</p><p><strong>Results: </strong>7,711 cases encompassing 78 features were included. Overall 90DM was 4% (n = 309). An XBoost classifier demonstrated the best performance and high robustness following external validation (AUROC: 0.86 [0.01]; AUPRC: 0.2 [0.04]). All models outperformed the ASA score (AUROC: 0.72; AUPRC: 0.08) and rCCI (AUROC: 0.81; AUPRC: 0.11). rCCI, patient age and C-reactive protein emerged as most decisive model weights. Models for gastric (AUROC: 0.88 [0.13]; AUPRC: 0.24 [0.26]) and intestinal surgery (AUROC: 0.87 [0.05]; AUPRC: 0.17 [0.09]) revealed the highest organ-specific performances, while pancreatic surgery yielded the lowest results (AUROC: 0.66 [0.08]; AUPRC: 0.22 [0.12]). A combined multi-organ approach (AUROC: 0.84 [0.04]; AUPRC: 0.21 [0.06]) demonstrated superiority over the weighted average across all organ-specific models (AUROC: 0.82 [0.07]; AUPRC: 0.2 [0.13]).</p><p><strong>Conclusion: </strong>ML offers robust preoperative risk stratification for 90DM in elective visceral surgery. Leveraging training across multi-organ cohorts may improve accuracy and robustness compared to organ-specific models. Prospective studies are needed to confirm the potential of ML in surgical outcome prediction.</p>","PeriodicalId":14401,"journal":{"name":"International journal of surgery","volume":" ","pages":""},"PeriodicalIF":12.5000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/JS9.0000000000002372","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SURGERY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Machine Learning (ML) is increasingly being adopted in biomedical research, however, its potential for outcome prediction in visceral surgery remains uncertain. This study compares the potential of ML methods for preoperative 90-day mortality (90DM) prediction of an aggregated multi-organ approach to conventional scoring systems and individual organ models.
Methods: This retrospective cohort study enrolled patients undergoing major elective visceral surgery between 2014 and 2022 across two tertiary centers. Multiple ML models for preoperative 90DM prediction were trained, externally validated and benchmarked against the American Society of Anesthesiologists (ASA) score and revised Charlson Comorbidity Index (rCCI). Areas under the receiver operating characteristic (AUROC) and precision recall curves (AUPRC) including standard deviations were calculated. Additionally, individual models for esophageal, gastric, intestinal, liver, and pancreatic surgery were developed and compared to an aggregated approach.
Results: 7,711 cases encompassing 78 features were included. Overall 90DM was 4% (n = 309). An XBoost classifier demonstrated the best performance and high robustness following external validation (AUROC: 0.86 [0.01]; AUPRC: 0.2 [0.04]). All models outperformed the ASA score (AUROC: 0.72; AUPRC: 0.08) and rCCI (AUROC: 0.81; AUPRC: 0.11). rCCI, patient age and C-reactive protein emerged as most decisive model weights. Models for gastric (AUROC: 0.88 [0.13]; AUPRC: 0.24 [0.26]) and intestinal surgery (AUROC: 0.87 [0.05]; AUPRC: 0.17 [0.09]) revealed the highest organ-specific performances, while pancreatic surgery yielded the lowest results (AUROC: 0.66 [0.08]; AUPRC: 0.22 [0.12]). A combined multi-organ approach (AUROC: 0.84 [0.04]; AUPRC: 0.21 [0.06]) demonstrated superiority over the weighted average across all organ-specific models (AUROC: 0.82 [0.07]; AUPRC: 0.2 [0.13]).
Conclusion: ML offers robust preoperative risk stratification for 90DM in elective visceral surgery. Leveraging training across multi-organ cohorts may improve accuracy and robustness compared to organ-specific models. Prospective studies are needed to confirm the potential of ML in surgical outcome prediction.
期刊介绍:
The International Journal of Surgery (IJS) has a broad scope, encompassing all surgical specialties. Its primary objective is to facilitate the exchange of crucial ideas and lines of thought between and across these specialties.By doing so, the journal aims to counter the growing trend of increasing sub-specialization, which can result in "tunnel-vision" and the isolation of significant surgical advancements within specific specialties.