预测心脏手术后院内死亡率的机器学习与传统建模方法比较分析：基于全国心脏手术登记的时空外部验证。

IF 4.8 2区医学 Q1 CARDIAC & CARDIOVASCULAR SYSTEMS

European Heart Journal - Quality of Care and Clinical Outcomes Pub Date : 2024-03-01 DOI:10.1093/ehjqcco/qcad028

Juntong Zeng, Danwei Zhang, Shen Lin, Xiaoting Su, Peng Wang, Yan Zhao, Zhe Zheng

{"title":"预测心脏手术后院内死亡率的机器学习与传统建模方法比较分析：基于全国心脏手术登记的时空外部验证。","authors":"Juntong Zeng, Danwei Zhang, Shen Lin, Xiaoting Su, Peng Wang, Yan Zhao, Zhe Zheng","doi":"10.1093/ehjqcco/qcad028","DOIUrl":null,"url":null,"abstract":"Aims: Preoperative risk assessment is crucial for cardiac surgery. Although previous studies suggested machine learning (ML) may improve in-hospital mortality predictions after cardiac surgery compared to traditional modeling approaches, the validity is doubted due to lacking external validation, limited sample sizes, and inadequate modeling considerations. We aimed to assess predictive performance between ML and traditional modelling approaches, while addressing these major limitations.Methods and results: Adult cardiac surgery cases (n = 168 565) between 2013 and 2018 in the Chinese Cardiac Surgery Registry were used to develop, validate, and compare various ML vs. logistic regression (LR) models. The dataset was split for temporal (2013-2017 for training, 2018 for testing) and spatial (geographically-stratified random selection of 83 centers for training, 22 for testing) experiments, respectively. Model performances were evaluated in testing sets for discrimination and calibration. The overall in-hospital mortality was 1.9%. In the temporal testing set (n = 32 184), the best-performing ML model demonstrated a similar area under the receiver operating characteristic curve (AUC) of 0.797 (95% CI 0.779-0.815) to the LR model (AUC 0.791 [95% CI 0.775-0.808]; P = 0.12). In the spatial experiment (n = 28 323), the best ML model showed a statistically better but modest performance improvement (AUC 0.732 [95% CI 0.710-0.754]) than LR (AUC 0.713 [95% CI 0.691-0.737]; P = 0.002). Varying feature selection methods had relatively smaller effects on ML models. Most ML and LR models were significantly miscalibrated.Conclusion: ML provided only marginal improvements over traditional modelling approaches in predicting cardiac surgery mortality with routine preoperative variables, which calls for more judicious use of ML in practice.","PeriodicalId":11869,"journal":{"name":"European Heart Journal - Quality of Care and Clinical Outcomes","volume":" ","pages":"121-131"},"PeriodicalIF":4.8000,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparative analysis of machine learning vs. traditional modeling approaches for predicting in-hospital mortality after cardiac surgery: temporal and spatial external validation based on a nationwide cardiac surgery registry.\",\"authors\":\"Juntong Zeng, Danwei Zhang, Shen Lin, Xiaoting Su, Peng Wang, Yan Zhao, Zhe Zheng\",\"doi\":\"10.1093/ehjqcco/qcad028\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Aims: Preoperative risk assessment is crucial for cardiac surgery. Although previous studies suggested machine learning (ML) may improve in-hospital mortality predictions after cardiac surgery compared to traditional modeling approaches, the validity is doubted due to lacking external validation, limited sample sizes, and inadequate modeling considerations. We aimed to assess predictive performance between ML and traditional modelling approaches, while addressing these major limitations.Methods and results: Adult cardiac surgery cases (n = 168 565) between 2013 and 2018 in the Chinese Cardiac Surgery Registry were used to develop, validate, and compare various ML vs. logistic regression (LR) models. The dataset was split for temporal (2013-2017 for training, 2018 for testing) and spatial (geographically-stratified random selection of 83 centers for training, 22 for testing) experiments, respectively. Model performances were evaluated in testing sets for discrimination and calibration. The overall in-hospital mortality was 1.9%. In the temporal testing set (n = 32 184), the best-performing ML model demonstrated a similar area under the receiver operating characteristic curve (AUC) of 0.797 (95% CI 0.779-0.815) to the LR model (AUC 0.791 [95% CI 0.775-0.808]; P = 0.12). In the spatial experiment (n = 28 323), the best ML model showed a statistically better but modest performance improvement (AUC 0.732 [95% CI 0.710-0.754]) than LR (AUC 0.713 [95% CI 0.691-0.737]; P = 0.002). Varying feature selection methods had relatively smaller effects on ML models. Most ML and LR models were significantly miscalibrated.Conclusion: ML provided only marginal improvements over traditional modelling approaches in predicting cardiac surgery mortality with routine preoperative variables, which calls for more judicious use of ML in practice.\",\"PeriodicalId\":11869,\"journal\":{\"name\":\"European Heart Journal - Quality of Care and Clinical Outcomes\",\"volume\":\" \",\"pages\":\"121-131\"},\"PeriodicalIF\":4.8000,\"publicationDate\":\"2024-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"European Heart Journal - Quality of Care and Clinical Outcomes\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1093/ehjqcco/qcad028\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CARDIAC & CARDIOVASCULAR SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Heart Journal - Quality of Care and Clinical Outcomes","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/ehjqcco/qcad028","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CARDIAC & CARDIOVASCULAR SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

目的：术前风险评估对心脏手术至关重要。尽管之前的研究表明，与传统建模方法相比，机器学习（ML）可提高心脏手术后的院内死亡率预测，但由于缺乏外部验证、样本量有限以及建模考虑不足，其有效性受到了质疑。我们旨在评估 ML 与传统建模方法之间的预测性能，同时解决这些主要局限性：我们使用中国心脏外科登记处 2013 年至 2018 年间的成人心脏手术病例（n = 168 565）来开发、验证和比较各种 ML 与逻辑回归（LR）模型。数据集在时间上（2013-2017 年用于训练，2018 年用于测试）和空间上（按地理分层随机选择 83 个中心用于训练，22 个中心用于测试）分别进行了拆分实验。在测试集中对模型的性能进行了评估，以进行判别和校准。总体院内死亡率为 1.9%。在时间测试集（n = 32 184）中，表现最好的 ML 模型与 LR 模型（AUC 0.791 [95% CI 0.775-0.808]；P = 0.12）的接收者操作特征曲线下面积（AUC）相似，均为 0.797（95% CI 0.779-0.815）。在空间实验（n = 28 323）中，最佳 ML 模型比 LR 模型（AUC 0.713 [95% CI 0.691-0.737]；P = 0.002）在统计上有更好但不高的性能改进（AUC 0.732 [95% CI 0.710-0.754]）。不同的特征选择方法对 ML 模型的影响相对较小。大多数 ML 和 LR 模型都存在明显的误判：结论：与传统建模方法相比，ML 在利用常规术前变量预测心脏手术死亡率方面仅有微弱改进，这就要求在实践中更明智地使用 ML。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Comparative analysis of machine learning vs. traditional modeling approaches for predicting in-hospital mortality after cardiac surgery: temporal and spatial external validation based on a nationwide cardiac surgery registry.

Aims: Preoperative risk assessment is crucial for cardiac surgery. Although previous studies suggested machine learning (ML) may improve in-hospital mortality predictions after cardiac surgery compared to traditional modeling approaches, the validity is doubted due to lacking external validation, limited sample sizes, and inadequate modeling considerations. We aimed to assess predictive performance between ML and traditional modelling approaches, while addressing these major limitations.

Methods and results: Adult cardiac surgery cases (n = 168 565) between 2013 and 2018 in the Chinese Cardiac Surgery Registry were used to develop, validate, and compare various ML vs. logistic regression (LR) models. The dataset was split for temporal (2013-2017 for training, 2018 for testing) and spatial (geographically-stratified random selection of 83 centers for training, 22 for testing) experiments, respectively. Model performances were evaluated in testing sets for discrimination and calibration. The overall in-hospital mortality was 1.9%. In the temporal testing set (n = 32 184), the best-performing ML model demonstrated a similar area under the receiver operating characteristic curve (AUC) of 0.797 (95% CI 0.779-0.815) to the LR model (AUC 0.791 [95% CI 0.775-0.808]; P = 0.12). In the spatial experiment (n = 28 323), the best ML model showed a statistically better but modest performance improvement (AUC 0.732 [95% CI 0.710-0.754]) than LR (AUC 0.713 [95% CI 0.691-0.737]; P = 0.002). Varying feature selection methods had relatively smaller effects on ML models. Most ML and LR models were significantly miscalibrated.

Conclusion: ML provided only marginal improvements over traditional modelling approaches in predicting cardiac surgery mortality with routine preoperative variables, which calls for more judicious use of ML in practice.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

European Heart Journal - Quality of Care and Clinical Outcomes CARDIAC & CARDIOVASCULAR SYSTEMS-

CiteScore

9.40

自引率

3.80%

发文量

期刊介绍： European Heart Journal - Quality of Care & Clinical Outcomes is an English language, peer-reviewed journal dedicated to publishing cardiovascular outcomes research. It serves as an official journal of the European Society of Cardiology and maintains a close alliance with the European Heart Health Institute. The journal disseminates original research and topical reviews contributed by health scientists globally, with a focus on the quality of care and its impact on cardiovascular outcomes at the hospital, national, and international levels. It provides a platform for presenting the most outstanding cardiovascular outcomes research to influence cardiovascular public health policy on a global scale. Additionally, the journal aims to motivate young investigators and foster the growth of the outcomes research community.