针对特定风险的训练队列,解决手术风险预测中的阶级失衡问题。

IF 15.7 1区 医学 Q1 SURGERY
Jeremy A Balch, Matthew M Ruppert, Ziyuan Guan, Timothy R Buchanan, Kenneth L Abbott, Benjamin Shickel, Azra Bihorac, Muxuan Liang, Gilbert R Upchurch, Christopher J Tignanelli, Tyler J Loftus
{"title":"针对特定风险的训练队列,解决手术风险预测中的阶级失衡问题。","authors":"Jeremy A Balch, Matthew M Ruppert, Ziyuan Guan, Timothy R Buchanan, Kenneth L Abbott, Benjamin Shickel, Azra Bihorac, Muxuan Liang, Gilbert R Upchurch, Christopher J Tignanelli, Tyler J Loftus","doi":"10.1001/jamasurg.2024.4299","DOIUrl":null,"url":null,"abstract":"<p><strong>Importance: </strong>Machine learning tools are increasingly deployed for risk prediction and clinical decision support in surgery. Class imbalance adversely impacts predictive performance, especially for low-incidence complications.</p><p><strong>Objective: </strong>To evaluate risk-prediction model performance when trained on risk-specific cohorts.</p><p><strong>Design, setting, and participants: </strong>This cross-sectional study performed from February 2024 to July 2024 deployed a deep learning model, which generated risk scores for common postoperative complications. A total of 109 445 inpatient operations performed at 2 University of Florida Health hospitals from June 1, 2014, to May 5, 2021 were examined.</p><p><strong>Exposures: </strong>The model was trained de novo on separate cohorts for high-risk, medium-risk, and low-risk Common Procedure Terminology codes defined empirically by incidence of 5 postoperative complications: (1) in-hospital mortality; (2) prolonged intensive care unit (ICU) stay (≥48 hours); (3) prolonged mechanical ventilation (≥48 hours); (4) sepsis; and (5) acute kidney injury (AKI). Low-risk and high-risk cutoffs for complications were defined by the lower-third and upper-third prevalence in the dataset, except for mortality, cutoffs for which were set at 1% or less and greater than 3%, respectively.</p><p><strong>Main outcomes and measures: </strong>Model performance metrics were assessed for each risk-specific cohort alongside the baseline model. Metrics included area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), F1 scores, and accuracy for each model.</p><p><strong>Results: </strong>A total of 109 445 inpatient operations were examined among patients treated at 2 University of Florida Health hospitals in Gainesville (77 921 procedures [71.2%]) and Jacksonville (31 524 procedures [28.8%]). Median (IQR) patient age was 58 (43-68) years, and median (IQR) Charlson Comorbidity Index score was 2 (0-4). Among 109 445 operations, 55 646 patients were male (50.8%), and 66 495 patients (60.8%) underwent a nonemergent, inpatient operation. Training on the high-risk cohort had variable impact on AUROC, but significantly improved AUPRC (as assessed by nonoverlapping 95% confidence intervals) for predicting mortality (0.53; 95% CI, 0.43-0.64), AKI (0.61; 95% CI, 0.58-0.65), and prolonged ICU stay (0.91; 95% CI, 0.89-0.92). It also significantly improved F1 score for mortality (0.42; 95% CI, 0.36-0.49), prolonged mechanical ventilation (0.55; 95% CI, 0.52-0.58), sepsis (0.46; 95% CI, 0.43-0.49), and AKI (0.57; 95% CI, 0.54-0.59). After controlling for baseline model performance on high-risk cohorts, AUPRC increased significantly for in-hospital mortality only (0.53; 95% CI, 0.42-0.65 vs 0.29; 95% CI, 0.21-0.40).</p><p><strong>Conclusion and relevance: </strong>In this cross-sectional study, by training separate models using a priori knowledge for procedure-specific risk classes, improved performance in standard evaluation metrics was observed, especially for low-prevalence complications like in-hospital mortality. Used cautiously, this approach may represent an optimal training strategy for surgical risk-prediction models.</p>","PeriodicalId":14690,"journal":{"name":"JAMA surgery","volume":null,"pages":null},"PeriodicalIF":15.7000,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11465118/pdf/","citationCount":"0","resultStr":"{\"title\":\"Risk-Specific Training Cohorts to Address Class Imbalance in Surgical Risk Prediction.\",\"authors\":\"Jeremy A Balch, Matthew M Ruppert, Ziyuan Guan, Timothy R Buchanan, Kenneth L Abbott, Benjamin Shickel, Azra Bihorac, Muxuan Liang, Gilbert R Upchurch, Christopher J Tignanelli, Tyler J Loftus\",\"doi\":\"10.1001/jamasurg.2024.4299\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Importance: </strong>Machine learning tools are increasingly deployed for risk prediction and clinical decision support in surgery. Class imbalance adversely impacts predictive performance, especially for low-incidence complications.</p><p><strong>Objective: </strong>To evaluate risk-prediction model performance when trained on risk-specific cohorts.</p><p><strong>Design, setting, and participants: </strong>This cross-sectional study performed from February 2024 to July 2024 deployed a deep learning model, which generated risk scores for common postoperative complications. A total of 109 445 inpatient operations performed at 2 University of Florida Health hospitals from June 1, 2014, to May 5, 2021 were examined.</p><p><strong>Exposures: </strong>The model was trained de novo on separate cohorts for high-risk, medium-risk, and low-risk Common Procedure Terminology codes defined empirically by incidence of 5 postoperative complications: (1) in-hospital mortality; (2) prolonged intensive care unit (ICU) stay (≥48 hours); (3) prolonged mechanical ventilation (≥48 hours); (4) sepsis; and (5) acute kidney injury (AKI). Low-risk and high-risk cutoffs for complications were defined by the lower-third and upper-third prevalence in the dataset, except for mortality, cutoffs for which were set at 1% or less and greater than 3%, respectively.</p><p><strong>Main outcomes and measures: </strong>Model performance metrics were assessed for each risk-specific cohort alongside the baseline model. Metrics included area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), F1 scores, and accuracy for each model.</p><p><strong>Results: </strong>A total of 109 445 inpatient operations were examined among patients treated at 2 University of Florida Health hospitals in Gainesville (77 921 procedures [71.2%]) and Jacksonville (31 524 procedures [28.8%]). Median (IQR) patient age was 58 (43-68) years, and median (IQR) Charlson Comorbidity Index score was 2 (0-4). Among 109 445 operations, 55 646 patients were male (50.8%), and 66 495 patients (60.8%) underwent a nonemergent, inpatient operation. Training on the high-risk cohort had variable impact on AUROC, but significantly improved AUPRC (as assessed by nonoverlapping 95% confidence intervals) for predicting mortality (0.53; 95% CI, 0.43-0.64), AKI (0.61; 95% CI, 0.58-0.65), and prolonged ICU stay (0.91; 95% CI, 0.89-0.92). It also significantly improved F1 score for mortality (0.42; 95% CI, 0.36-0.49), prolonged mechanical ventilation (0.55; 95% CI, 0.52-0.58), sepsis (0.46; 95% CI, 0.43-0.49), and AKI (0.57; 95% CI, 0.54-0.59). After controlling for baseline model performance on high-risk cohorts, AUPRC increased significantly for in-hospital mortality only (0.53; 95% CI, 0.42-0.65 vs 0.29; 95% CI, 0.21-0.40).</p><p><strong>Conclusion and relevance: </strong>In this cross-sectional study, by training separate models using a priori knowledge for procedure-specific risk classes, improved performance in standard evaluation metrics was observed, especially for low-prevalence complications like in-hospital mortality. Used cautiously, this approach may represent an optimal training strategy for surgical risk-prediction models.</p>\",\"PeriodicalId\":14690,\"journal\":{\"name\":\"JAMA surgery\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":15.7000,\"publicationDate\":\"2024-10-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11465118/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JAMA surgery\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1001/jamasurg.2024.4299\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"SURGERY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JAMA surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1001/jamasurg.2024.4299","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SURGERY","Score":null,"Total":0}
引用次数: 0

摘要

重要性:机器学习工具越来越多地被用于外科手术的风险预测和临床决策支持。类别不平衡会对预测性能产生不利影响,尤其是对低发病率的并发症:评估风险预测模型在特定风险队列中的训练效果:这项横断面研究于 2024 年 2 月至 2024 年 7 月间进行,采用了一种深度学习模型,该模型可生成常见术后并发症的风险评分。研究对象为2014年6月1日至2021年5月5日期间在佛罗里达大学健康医院的2家医院进行的109 445例住院手术:该模型在高风险、中风险和低风险通用手术术语代码的不同队列上进行了全新训练,这些代码是根据以下5种术后并发症的发生率经验定义的:(1)院内死亡率;(2)重症监护室(ICU)住院时间延长(≥48小时);(3)机械通气时间延长(≥48小时);(4)败血症;(5)急性肾损伤(AKI)。并发症的低风险和高风险临界值由数据集中发病率的下三分之一和上三分之一定义,但死亡率除外,其临界值分别设定为 1%或以下和 3%以上:在评估基线模型的同时,还评估了每个特定风险队列的模型性能指标。指标包括接收者操作特征曲线下面积(AUROC)、精确度-回忆曲线下面积(AUPRC)、F1 分数和每个模型的精确度:在盖恩斯维尔(77 921 例手术 [71.2%])和杰克逊维尔(31 524 例手术 [28.8%])的两家佛罗里达大学健康医院接受治疗的患者共接受了 109 445 例住院手术。患者年龄中位数(IQR)为 58(43-68)岁,夏尔森综合指数评分中位数(IQR)为 2(0-4)分。在 109 445 例手术中,55 646 例患者为男性(50.8%),66 495 例患者(60.8%)接受了非急诊住院手术。对高危人群的培训对 AUROC 的影响不一,但对预测死亡率(0.53;95% CI,0.43-0.64)、AKI(0.61;95% CI,0.58-0.65)和延长重症监护室住院时间(0.91;95% CI,0.89-0.92)的 AUPRC 有显著提高(以不重叠的 95% 置信区间评估)。它还能明显改善死亡率(0.42;95% CI,0.36-0.49)、机械通气时间延长(0.55;95% CI,0.52-0.58)、败血症(0.46;95% CI,0.43-0.49)和 AKI(0.57;95% CI,0.54-0.59)的 F1 评分。在控制了高风险队列的基线模型性能后,AUPRC 仅在院内死亡率方面显著增加(0.53;95% CI,0.42-0.65 vs 0.29;95% CI,0.21-0.40):在这项横断面研究中,通过使用特定手术风险等级的先验知识训练单独的模型,观察到标准评价指标的性能有所改善,尤其是对于住院死亡率等低发生率并发症。谨慎使用这种方法,可能是手术风险预测模型的最佳训练策略。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Risk-Specific Training Cohorts to Address Class Imbalance in Surgical Risk Prediction.

Importance: Machine learning tools are increasingly deployed for risk prediction and clinical decision support in surgery. Class imbalance adversely impacts predictive performance, especially for low-incidence complications.

Objective: To evaluate risk-prediction model performance when trained on risk-specific cohorts.

Design, setting, and participants: This cross-sectional study performed from February 2024 to July 2024 deployed a deep learning model, which generated risk scores for common postoperative complications. A total of 109 445 inpatient operations performed at 2 University of Florida Health hospitals from June 1, 2014, to May 5, 2021 were examined.

Exposures: The model was trained de novo on separate cohorts for high-risk, medium-risk, and low-risk Common Procedure Terminology codes defined empirically by incidence of 5 postoperative complications: (1) in-hospital mortality; (2) prolonged intensive care unit (ICU) stay (≥48 hours); (3) prolonged mechanical ventilation (≥48 hours); (4) sepsis; and (5) acute kidney injury (AKI). Low-risk and high-risk cutoffs for complications were defined by the lower-third and upper-third prevalence in the dataset, except for mortality, cutoffs for which were set at 1% or less and greater than 3%, respectively.

Main outcomes and measures: Model performance metrics were assessed for each risk-specific cohort alongside the baseline model. Metrics included area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), F1 scores, and accuracy for each model.

Results: A total of 109 445 inpatient operations were examined among patients treated at 2 University of Florida Health hospitals in Gainesville (77 921 procedures [71.2%]) and Jacksonville (31 524 procedures [28.8%]). Median (IQR) patient age was 58 (43-68) years, and median (IQR) Charlson Comorbidity Index score was 2 (0-4). Among 109 445 operations, 55 646 patients were male (50.8%), and 66 495 patients (60.8%) underwent a nonemergent, inpatient operation. Training on the high-risk cohort had variable impact on AUROC, but significantly improved AUPRC (as assessed by nonoverlapping 95% confidence intervals) for predicting mortality (0.53; 95% CI, 0.43-0.64), AKI (0.61; 95% CI, 0.58-0.65), and prolonged ICU stay (0.91; 95% CI, 0.89-0.92). It also significantly improved F1 score for mortality (0.42; 95% CI, 0.36-0.49), prolonged mechanical ventilation (0.55; 95% CI, 0.52-0.58), sepsis (0.46; 95% CI, 0.43-0.49), and AKI (0.57; 95% CI, 0.54-0.59). After controlling for baseline model performance on high-risk cohorts, AUPRC increased significantly for in-hospital mortality only (0.53; 95% CI, 0.42-0.65 vs 0.29; 95% CI, 0.21-0.40).

Conclusion and relevance: In this cross-sectional study, by training separate models using a priori knowledge for procedure-specific risk classes, improved performance in standard evaluation metrics was observed, especially for low-prevalence complications like in-hospital mortality. Used cautiously, this approach may represent an optimal training strategy for surgical risk-prediction models.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
JAMA surgery
JAMA surgery SURGERY-
CiteScore
20.80
自引率
3.60%
发文量
400
期刊介绍: JAMA Surgery, an international peer-reviewed journal established in 1920, is the official publication of the Association of VA Surgeons, the Pacific Coast Surgical Association, and the Surgical Outcomes Club.It is a proud member of the JAMA Network, a consortium of peer-reviewed general medical and specialty publications.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信