使用机器学习预测择期内脏手术90天死亡率:一项回顾性多中心开发、验证和比较研究

IF 12.5 2区 医学 Q1 SURGERY
Christoph Riepe, Robin van de Water, Axel Winter, Bjarne Pfitzner, Lara Faraj, Robert Ahlborn, Maximilian Schulze, Daniela Zuluaga, Christian Schineis, Katharina Beyer, Johann Pratschke, Bert Arnrich, Igor M Sauer, Max M Maurer
{"title":"使用机器学习预测择期内脏手术90天死亡率:一项回顾性多中心开发、验证和比较研究","authors":"Christoph Riepe, Robin van de Water, Axel Winter, Bjarne Pfitzner, Lara Faraj, Robert Ahlborn, Maximilian Schulze, Daniela Zuluaga, Christian Schineis, Katharina Beyer, Johann Pratschke, Bert Arnrich, Igor M Sauer, Max M Maurer","doi":"10.1097/JS9.0000000000002372","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Machine Learning (ML) is increasingly being adopted in biomedical research, however, its potential for outcome prediction in visceral surgery remains uncertain. This study compares the potential of ML methods for preoperative 90-day mortality (90DM) prediction of an aggregated multi-organ approach to conventional scoring systems and individual organ models.</p><p><strong>Methods: </strong>This retrospective cohort study enrolled patients undergoing major elective visceral surgery between 2014 and 2022 across two tertiary centers. Multiple ML models for preoperative 90DM prediction were trained, externally validated and benchmarked against the American Society of Anesthesiologists (ASA) score and revised Charlson Comorbidity Index (rCCI). Areas under the receiver operating characteristic (AUROC) and precision recall curves (AUPRC) including standard deviations were calculated. Additionally, individual models for esophageal, gastric, intestinal, liver, and pancreatic surgery were developed and compared to an aggregated approach.</p><p><strong>Results: </strong>7,711 cases encompassing 78 features were included. Overall 90DM was 4% (n = 309). An XBoost classifier demonstrated the best performance and high robustness following external validation (AUROC: 0.86 [0.01]; AUPRC: 0.2 [0.04]). All models outperformed the ASA score (AUROC: 0.72; AUPRC: 0.08) and rCCI (AUROC: 0.81; AUPRC: 0.11). rCCI, patient age and C-reactive protein emerged as most decisive model weights. Models for gastric (AUROC: 0.88 [0.13]; AUPRC: 0.24 [0.26]) and intestinal surgery (AUROC: 0.87 [0.05]; AUPRC: 0.17 [0.09]) revealed the highest organ-specific performances, while pancreatic surgery yielded the lowest results (AUROC: 0.66 [0.08]; AUPRC: 0.22 [0.12]). A combined multi-organ approach (AUROC: 0.84 [0.04]; AUPRC: 0.21 [0.06]) demonstrated superiority over the weighted average across all organ-specific models (AUROC: 0.82 [0.07]; AUPRC: 0.2 [0.13]).</p><p><strong>Conclusion: </strong>ML offers robust preoperative risk stratification for 90DM in elective visceral surgery. Leveraging training across multi-organ cohorts may improve accuracy and robustness compared to organ-specific models. Prospective studies are needed to confirm the potential of ML in surgical outcome prediction.</p>","PeriodicalId":14401,"journal":{"name":"International journal of surgery","volume":" ","pages":""},"PeriodicalIF":12.5000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"90-Day mortality prediction in elective visceral surgery using machine learning: a retrospective multicenter development, validation and comparison study.\",\"authors\":\"Christoph Riepe, Robin van de Water, Axel Winter, Bjarne Pfitzner, Lara Faraj, Robert Ahlborn, Maximilian Schulze, Daniela Zuluaga, Christian Schineis, Katharina Beyer, Johann Pratschke, Bert Arnrich, Igor M Sauer, Max M Maurer\",\"doi\":\"10.1097/JS9.0000000000002372\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Machine Learning (ML) is increasingly being adopted in biomedical research, however, its potential for outcome prediction in visceral surgery remains uncertain. This study compares the potential of ML methods for preoperative 90-day mortality (90DM) prediction of an aggregated multi-organ approach to conventional scoring systems and individual organ models.</p><p><strong>Methods: </strong>This retrospective cohort study enrolled patients undergoing major elective visceral surgery between 2014 and 2022 across two tertiary centers. Multiple ML models for preoperative 90DM prediction were trained, externally validated and benchmarked against the American Society of Anesthesiologists (ASA) score and revised Charlson Comorbidity Index (rCCI). Areas under the receiver operating characteristic (AUROC) and precision recall curves (AUPRC) including standard deviations were calculated. Additionally, individual models for esophageal, gastric, intestinal, liver, and pancreatic surgery were developed and compared to an aggregated approach.</p><p><strong>Results: </strong>7,711 cases encompassing 78 features were included. Overall 90DM was 4% (n = 309). An XBoost classifier demonstrated the best performance and high robustness following external validation (AUROC: 0.86 [0.01]; AUPRC: 0.2 [0.04]). All models outperformed the ASA score (AUROC: 0.72; AUPRC: 0.08) and rCCI (AUROC: 0.81; AUPRC: 0.11). rCCI, patient age and C-reactive protein emerged as most decisive model weights. Models for gastric (AUROC: 0.88 [0.13]; AUPRC: 0.24 [0.26]) and intestinal surgery (AUROC: 0.87 [0.05]; AUPRC: 0.17 [0.09]) revealed the highest organ-specific performances, while pancreatic surgery yielded the lowest results (AUROC: 0.66 [0.08]; AUPRC: 0.22 [0.12]). A combined multi-organ approach (AUROC: 0.84 [0.04]; AUPRC: 0.21 [0.06]) demonstrated superiority over the weighted average across all organ-specific models (AUROC: 0.82 [0.07]; AUPRC: 0.2 [0.13]).</p><p><strong>Conclusion: </strong>ML offers robust preoperative risk stratification for 90DM in elective visceral surgery. Leveraging training across multi-organ cohorts may improve accuracy and robustness compared to organ-specific models. Prospective studies are needed to confirm the potential of ML in surgical outcome prediction.</p>\",\"PeriodicalId\":14401,\"journal\":{\"name\":\"International journal of surgery\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":12.5000,\"publicationDate\":\"2025-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International journal of surgery\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1097/JS9.0000000000002372\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"SURGERY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/JS9.0000000000002372","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SURGERY","Score":null,"Total":0}
引用次数: 0

摘要

背景:机器学习(ML)越来越多地应用于生物医学研究,然而,它在内脏手术结果预测方面的潜力仍然不确定。本研究比较了ML方法在术前90天死亡率(90DM)预测中的潜力,该方法是一种聚合多器官方法,与传统评分系统和单个器官模型相比。方法:本回顾性队列研究纳入了2014年至2022年间在两个三级中心接受重大内脏手术的患者。对用于术前90DM预测的多个ML模型进行训练、外部验证,并以美国麻醉医师协会(ASA)评分和修订的Charlson合并症指数(rCCI)为基准。计算了包括标准差在内的受试者工作特征(AUROC)和精确召回曲线(AUPRC)下的面积。此外,还开发了食管、胃、肠、肝和胰腺手术的个体模型,并与综合方法进行了比较。结果:共纳入7711例,涵盖78个特征。总体90DM为4% (n = 309)。经过外部验证,XBoost分类器表现出最佳性能和高鲁棒性(AUROC: 0.86 [0.01];[0.04])。所有模型均优于ASA评分(AUROC: 0.72;AUROC: 0.08)和rCCI (AUROC: 0.81;AUPRC: 0.11)。rCCI、患者年龄和c反应蛋白是最具决定性的模型权重。胃模型(AUROC: 0.88 [0.13];AUPRC: 0.24[0.26])和肠道手术(AUROC: 0.87 [0.05];AUPRC: 0.17[0.09])显示出最高的器官特异性表现,而胰腺手术的结果最低(AUROC: 0.66 [0.08];[0.12])。联合多脏器入路(AUROC: 0.84 [0.04];AUROC: 0.21[0.06])优于所有器官特异性模型的加权平均值(AUROC: 0.82 [0.07];[0.13])。结论:ML为90DM患者择期内脏手术提供了可靠的术前风险分层。与器官特异性模型相比,利用跨多器官队列的训练可以提高准确性和稳健性。需要前瞻性研究来证实ML在手术预后预测中的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
90-Day mortality prediction in elective visceral surgery using machine learning: a retrospective multicenter development, validation and comparison study.

Background: Machine Learning (ML) is increasingly being adopted in biomedical research, however, its potential for outcome prediction in visceral surgery remains uncertain. This study compares the potential of ML methods for preoperative 90-day mortality (90DM) prediction of an aggregated multi-organ approach to conventional scoring systems and individual organ models.

Methods: This retrospective cohort study enrolled patients undergoing major elective visceral surgery between 2014 and 2022 across two tertiary centers. Multiple ML models for preoperative 90DM prediction were trained, externally validated and benchmarked against the American Society of Anesthesiologists (ASA) score and revised Charlson Comorbidity Index (rCCI). Areas under the receiver operating characteristic (AUROC) and precision recall curves (AUPRC) including standard deviations were calculated. Additionally, individual models for esophageal, gastric, intestinal, liver, and pancreatic surgery were developed and compared to an aggregated approach.

Results: 7,711 cases encompassing 78 features were included. Overall 90DM was 4% (n = 309). An XBoost classifier demonstrated the best performance and high robustness following external validation (AUROC: 0.86 [0.01]; AUPRC: 0.2 [0.04]). All models outperformed the ASA score (AUROC: 0.72; AUPRC: 0.08) and rCCI (AUROC: 0.81; AUPRC: 0.11). rCCI, patient age and C-reactive protein emerged as most decisive model weights. Models for gastric (AUROC: 0.88 [0.13]; AUPRC: 0.24 [0.26]) and intestinal surgery (AUROC: 0.87 [0.05]; AUPRC: 0.17 [0.09]) revealed the highest organ-specific performances, while pancreatic surgery yielded the lowest results (AUROC: 0.66 [0.08]; AUPRC: 0.22 [0.12]). A combined multi-organ approach (AUROC: 0.84 [0.04]; AUPRC: 0.21 [0.06]) demonstrated superiority over the weighted average across all organ-specific models (AUROC: 0.82 [0.07]; AUPRC: 0.2 [0.13]).

Conclusion: ML offers robust preoperative risk stratification for 90DM in elective visceral surgery. Leveraging training across multi-organ cohorts may improve accuracy and robustness compared to organ-specific models. Prospective studies are needed to confirm the potential of ML in surgical outcome prediction.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
17.70
自引率
3.30%
发文量
0
审稿时长
6-12 weeks
期刊介绍: The International Journal of Surgery (IJS) has a broad scope, encompassing all surgical specialties. Its primary objective is to facilitate the exchange of crucial ideas and lines of thought between and across these specialties.By doing so, the journal aims to counter the growing trend of increasing sub-specialization, which can result in "tunnel-vision" and the isolation of significant surgical advancements within specific specialties.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信