Development and validation of artificial intelligence models for early detection of postoperative infections (PERISCOPE): a multicentre study using electronic health record data
Siri L. van der Meijden , Anna M. van Boekel , Laurens J. Schinkelshoek , Harry van Goor , Ewout W. Steyerberg , Rob G.H.H. Nelissen , Dieter Mesotten , Bart F. Geerts , Mark G.J. de Boer , M. Sesmu Arbous
{"title":"Development and validation of artificial intelligence models for early detection of postoperative infections (PERISCOPE): a multicentre study using electronic health record data","authors":"Siri L. van der Meijden , Anna M. van Boekel , Laurens J. Schinkelshoek , Harry van Goor , Ewout W. Steyerberg , Rob G.H.H. Nelissen , Dieter Mesotten , Bart F. Geerts , Mark G.J. de Boer , M. Sesmu Arbous","doi":"10.1016/j.lanepe.2024.101163","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Postoperative infections significantly impact patient outcomes and costs, exacerbated by late diagnoses, yet early reliable predictors are scarce. Existing artificial intelligence (AI) models for postoperative infection prediction often lack external validation or perform poorly in local settings when validated. We aimed to develop locally valid models as part of the PERISCOPE AI system to enable early detection, safer discharge, and more timely treatment of patients.</div></div><div><h3>Methods</h3><div>We developed and validated XGBoost models to predict postoperative infections within 7 and 30 days of surgery. Using retrospective pre-operative and intra-operative electronic health record data from 2014 to 2023 across various surgical specialities, the models were developed at Hospital A and validated and updated at Hospitals B and C in the Netherlands and Belgium. Model performance was evaluated before and after updating using the two most recent years of data as temporal validation datasets. Main outcome measures were model discrimination (area under the receiver operating characteristic curve (AUROC)), calibration (slope, intercept, and plots), and clinical utility (decision curve analysis with net benefit).</div></div><div><h3>Findings</h3><div>The study included 253,010 surgical procedures with 23,903 infections within 30-days. Discriminative performance, calibration properties, and clinical utility significantly improved after updating. Final AUROCs after updating for Hospitals A, B, and C were 0.82 (95% confidence interval (CI) 0.81–0.83), 0.82 (95% CI 0.81–0.83), and 0.91 (95% CI 0.90–0.91) respectively for 30-day predictions on the temporal validation datasets (2022–2023). Calibration plots demonstrated adequate correspondence between observed outcomes and predicted risk. All local models were deemed clinically useful as the net benefit was higher than default strategies (treat all and treat none) over a wide range of clinically relevant decision thresholds.</div></div><div><h3>Interpretation</h3><div>PERISCOPE can accurately predict overall postoperative infections within 7- and 30-days post-surgery. The robust performance implies potential for improving clinical care in diverse clinical target populations. This study supports the need for approaches to local updating of AI models to account for domain shifts in patient populations and data distributions across different clinical settings.</div></div><div><h3>Funding</h3><div>This study was funded by a <span>REACT EU</span> grant from <span>European Regional Development Fund (ERDF)</span> and <span>Kansen voor West</span>.</div></div>","PeriodicalId":53223,"journal":{"name":"Lancet Regional Health-Europe","volume":"49 ","pages":"Article 101163"},"PeriodicalIF":13.6000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11667051/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Lancet Regional Health-Europe","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666776224003326","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Background
Postoperative infections significantly impact patient outcomes and costs, exacerbated by late diagnoses, yet early reliable predictors are scarce. Existing artificial intelligence (AI) models for postoperative infection prediction often lack external validation or perform poorly in local settings when validated. We aimed to develop locally valid models as part of the PERISCOPE AI system to enable early detection, safer discharge, and more timely treatment of patients.
Methods
We developed and validated XGBoost models to predict postoperative infections within 7 and 30 days of surgery. Using retrospective pre-operative and intra-operative electronic health record data from 2014 to 2023 across various surgical specialities, the models were developed at Hospital A and validated and updated at Hospitals B and C in the Netherlands and Belgium. Model performance was evaluated before and after updating using the two most recent years of data as temporal validation datasets. Main outcome measures were model discrimination (area under the receiver operating characteristic curve (AUROC)), calibration (slope, intercept, and plots), and clinical utility (decision curve analysis with net benefit).
Findings
The study included 253,010 surgical procedures with 23,903 infections within 30-days. Discriminative performance, calibration properties, and clinical utility significantly improved after updating. Final AUROCs after updating for Hospitals A, B, and C were 0.82 (95% confidence interval (CI) 0.81–0.83), 0.82 (95% CI 0.81–0.83), and 0.91 (95% CI 0.90–0.91) respectively for 30-day predictions on the temporal validation datasets (2022–2023). Calibration plots demonstrated adequate correspondence between observed outcomes and predicted risk. All local models were deemed clinically useful as the net benefit was higher than default strategies (treat all and treat none) over a wide range of clinically relevant decision thresholds.
Interpretation
PERISCOPE can accurately predict overall postoperative infections within 7- and 30-days post-surgery. The robust performance implies potential for improving clinical care in diverse clinical target populations. This study supports the need for approaches to local updating of AI models to account for domain shifts in patient populations and data distributions across different clinical settings.
Funding
This study was funded by a REACT EU grant from European Regional Development Fund (ERDF) and Kansen voor West.
背景:术后感染显著影响患者预后和成本,晚期诊断加剧了感染,但早期可靠的预测因素很少。用于术后感染预测的现有人工智能(AI)模型往往缺乏外部验证,或者在验证后在本地环境中表现不佳。我们的目标是开发本地有效的模型,作为PERISCOPE人工智能系统的一部分,以实现早期发现,更安全的出院,更及时的治疗患者。方法:我们建立并验证了XGBoost模型来预测术后7天和30天的感染。利用2014年至2023年各外科专科的回顾性术前和术中电子健康记录数据,这些模型由A医院开发,并在荷兰和比利时的B医院和C医院进行验证和更新。使用最近两年的数据作为时间验证数据集,在更新前后评估模型性能。主要结果测量是模型判别(受试者工作特征曲线下面积(AUROC))、校准(斜率、截距和图)和临床效用(具有净效益的决策曲线分析)。研究结果:该研究包括253,010例手术,其中23,903例在30天内感染。更新后的判别性能、校准性能和临床实用性显著提高。对于时间验证数据集(2022-2023)的30天预测,A、B和C医院更新后的最终auroc分别为0.82(95%可信区间(CI) 0.81-0.83)、0.82 (95% CI 0.81-0.83)和0.91 (95% CI 0.90-0.91)。校准图显示观察结果和预测风险之间有足够的对应关系。在广泛的临床相关决策阈值范围内,所有局部模型都被认为是临床有用的,因为净收益高于默认策略(全部治疗和不治疗)。PERISCOPE可以在术后7天和30天内准确预测整体术后感染。稳健的表现意味着潜在的改善临床护理在不同的临床目标人群。该研究支持了人工智能模型局部更新方法的需求,以考虑患者群体的领域变化和不同临床环境下的数据分布。经费:本研究由欧洲区域发展基金(ERDF)和Kansen voor West的REACT EU资助。
期刊介绍:
The Lancet Regional Health – Europe, a gold open access journal, is part of The Lancet's global effort to promote healthcare quality and accessibility worldwide. It focuses on advancing clinical practice and health policy in the European region to enhance health outcomes. The journal publishes high-quality original research advocating changes in clinical practice and health policy. It also includes reviews, commentaries, and opinion pieces on regional health topics, such as infection and disease prevention, healthy aging, and reducing health disparities.