Juntong Zeng, Danwei Zhang, Shen Lin, Xiaoting Su, Peng Wang, Yan Zhao, Zhe Zheng
{"title":"预测心脏手术后院内死亡率的机器学习与传统建模方法比较分析:基于全国心脏手术登记的时空外部验证。","authors":"Juntong Zeng, Danwei Zhang, Shen Lin, Xiaoting Su, Peng Wang, Yan Zhao, Zhe Zheng","doi":"10.1093/ehjqcco/qcad028","DOIUrl":null,"url":null,"abstract":"<p><strong>Aims: </strong>Preoperative risk assessment is crucial for cardiac surgery. Although previous studies suggested machine learning (ML) may improve in-hospital mortality predictions after cardiac surgery compared to traditional modeling approaches, the validity is doubted due to lacking external validation, limited sample sizes, and inadequate modeling considerations. We aimed to assess predictive performance between ML and traditional modelling approaches, while addressing these major limitations.</p><p><strong>Methods and results: </strong>Adult cardiac surgery cases (n = 168 565) between 2013 and 2018 in the Chinese Cardiac Surgery Registry were used to develop, validate, and compare various ML vs. logistic regression (LR) models. The dataset was split for temporal (2013-2017 for training, 2018 for testing) and spatial (geographically-stratified random selection of 83 centers for training, 22 for testing) experiments, respectively. Model performances were evaluated in testing sets for discrimination and calibration. The overall in-hospital mortality was 1.9%. In the temporal testing set (n = 32 184), the best-performing ML model demonstrated a similar area under the receiver operating characteristic curve (AUC) of 0.797 (95% CI 0.779-0.815) to the LR model (AUC 0.791 [95% CI 0.775-0.808]; P = 0.12). In the spatial experiment (n = 28 323), the best ML model showed a statistically better but modest performance improvement (AUC 0.732 [95% CI 0.710-0.754]) than LR (AUC 0.713 [95% CI 0.691-0.737]; P = 0.002). Varying feature selection methods had relatively smaller effects on ML models. Most ML and LR models were significantly miscalibrated.</p><p><strong>Conclusion: </strong>ML provided only marginal improvements over traditional modelling approaches in predicting cardiac surgery mortality with routine preoperative variables, which calls for more judicious use of ML in practice.</p>","PeriodicalId":4,"journal":{"name":"ACS Applied Energy Materials","volume":null,"pages":null},"PeriodicalIF":5.4000,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparative analysis of machine learning vs. traditional modeling approaches for predicting in-hospital mortality after cardiac surgery: temporal and spatial external validation based on a nationwide cardiac surgery registry.\",\"authors\":\"Juntong Zeng, Danwei Zhang, Shen Lin, Xiaoting Su, Peng Wang, Yan Zhao, Zhe Zheng\",\"doi\":\"10.1093/ehjqcco/qcad028\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Aims: </strong>Preoperative risk assessment is crucial for cardiac surgery. Although previous studies suggested machine learning (ML) may improve in-hospital mortality predictions after cardiac surgery compared to traditional modeling approaches, the validity is doubted due to lacking external validation, limited sample sizes, and inadequate modeling considerations. We aimed to assess predictive performance between ML and traditional modelling approaches, while addressing these major limitations.</p><p><strong>Methods and results: </strong>Adult cardiac surgery cases (n = 168 565) between 2013 and 2018 in the Chinese Cardiac Surgery Registry were used to develop, validate, and compare various ML vs. logistic regression (LR) models. The dataset was split for temporal (2013-2017 for training, 2018 for testing) and spatial (geographically-stratified random selection of 83 centers for training, 22 for testing) experiments, respectively. Model performances were evaluated in testing sets for discrimination and calibration. The overall in-hospital mortality was 1.9%. In the temporal testing set (n = 32 184), the best-performing ML model demonstrated a similar area under the receiver operating characteristic curve (AUC) of 0.797 (95% CI 0.779-0.815) to the LR model (AUC 0.791 [95% CI 0.775-0.808]; P = 0.12). In the spatial experiment (n = 28 323), the best ML model showed a statistically better but modest performance improvement (AUC 0.732 [95% CI 0.710-0.754]) than LR (AUC 0.713 [95% CI 0.691-0.737]; P = 0.002). Varying feature selection methods had relatively smaller effects on ML models. Most ML and LR models were significantly miscalibrated.</p><p><strong>Conclusion: </strong>ML provided only marginal improvements over traditional modelling approaches in predicting cardiac surgery mortality with routine preoperative variables, which calls for more judicious use of ML in practice.</p>\",\"PeriodicalId\":4,\"journal\":{\"name\":\"ACS Applied Energy Materials\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":5.4000,\"publicationDate\":\"2024-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACS Applied Energy Materials\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1093/ehjqcco/qcad028\",\"RegionNum\":3,\"RegionCategory\":\"材料科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"CHEMISTRY, PHYSICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Energy Materials","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/ehjqcco/qcad028","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0
摘要
目的:术前风险评估对心脏手术至关重要。尽管之前的研究表明,与传统建模方法相比,机器学习(ML)可提高心脏手术后的院内死亡率预测,但由于缺乏外部验证、样本量有限以及建模考虑不足,其有效性受到了质疑。我们旨在评估 ML 与传统建模方法之间的预测性能,同时解决这些主要局限性:我们使用中国心脏外科登记处 2013 年至 2018 年间的成人心脏手术病例(n = 168 565)来开发、验证和比较各种 ML 与逻辑回归(LR)模型。数据集在时间上(2013-2017 年用于训练,2018 年用于测试)和空间上(按地理分层随机选择 83 个中心用于训练,22 个中心用于测试)分别进行了拆分实验。在测试集中对模型的性能进行了评估,以进行判别和校准。总体院内死亡率为 1.9%。在时间测试集(n = 32 184)中,表现最好的 ML 模型与 LR 模型(AUC 0.791 [95% CI 0.775-0.808];P = 0.12)的接收者操作特征曲线下面积(AUC)相似,均为 0.797(95% CI 0.779-0.815)。在空间实验(n = 28 323)中,最佳 ML 模型比 LR 模型(AUC 0.713 [95% CI 0.691-0.737];P = 0.002)在统计上有更好但不高的性能改进(AUC 0.732 [95% CI 0.710-0.754])。不同的特征选择方法对 ML 模型的影响相对较小。大多数 ML 和 LR 模型都存在明显的误判:结论:与传统建模方法相比,ML 在利用常规术前变量预测心脏手术死亡率方面仅有微弱改进,这就要求在实践中更明智地使用 ML。
Comparative analysis of machine learning vs. traditional modeling approaches for predicting in-hospital mortality after cardiac surgery: temporal and spatial external validation based on a nationwide cardiac surgery registry.
Aims: Preoperative risk assessment is crucial for cardiac surgery. Although previous studies suggested machine learning (ML) may improve in-hospital mortality predictions after cardiac surgery compared to traditional modeling approaches, the validity is doubted due to lacking external validation, limited sample sizes, and inadequate modeling considerations. We aimed to assess predictive performance between ML and traditional modelling approaches, while addressing these major limitations.
Methods and results: Adult cardiac surgery cases (n = 168 565) between 2013 and 2018 in the Chinese Cardiac Surgery Registry were used to develop, validate, and compare various ML vs. logistic regression (LR) models. The dataset was split for temporal (2013-2017 for training, 2018 for testing) and spatial (geographically-stratified random selection of 83 centers for training, 22 for testing) experiments, respectively. Model performances were evaluated in testing sets for discrimination and calibration. The overall in-hospital mortality was 1.9%. In the temporal testing set (n = 32 184), the best-performing ML model demonstrated a similar area under the receiver operating characteristic curve (AUC) of 0.797 (95% CI 0.779-0.815) to the LR model (AUC 0.791 [95% CI 0.775-0.808]; P = 0.12). In the spatial experiment (n = 28 323), the best ML model showed a statistically better but modest performance improvement (AUC 0.732 [95% CI 0.710-0.754]) than LR (AUC 0.713 [95% CI 0.691-0.737]; P = 0.002). Varying feature selection methods had relatively smaller effects on ML models. Most ML and LR models were significantly miscalibrated.
Conclusion: ML provided only marginal improvements over traditional modelling approaches in predicting cardiac surgery mortality with routine preoperative variables, which calls for more judicious use of ML in practice.
期刊介绍:
ACS Applied Energy Materials is an interdisciplinary journal publishing original research covering all aspects of materials, engineering, chemistry, physics and biology relevant to energy conversion and storage. The journal is devoted to reports of new and original experimental and theoretical research of an applied nature that integrate knowledge in the areas of materials, engineering, physics, bioscience, and chemistry into important energy applications.